If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Sunday, December 23, 2012

Getting Play2.0.x to work with Git Bash on Windows

I unzipped play2.0.4 and created a PLAY_HOME variable. I also made sure to add PLAY_HOME to my PATH variable.

nxp10009@NXL01366 /c/development/play-2.0.4
$ echo $PLAY_HOME
C:\development\play-2.0.4

When I tried to run play from the Git Bash shell I ran into below exception. In the past I didn't bother that much and just used the DOS shell as a workaround. But I like to work from one shell only so time to look for a solution.
nxp10009@NXL01366 /c/workspaces/scala
$ play
Error during sbt execution: Could not find configuration file 'c:/development/play-2.0.4/framework/sbt/play.boot.properties'.  Searched:
        file:/c:/workspaces/scala/
        file:/C:/Users/nxp10009/
        file:/C:/development/play-2.0.4/framework/sbt/

Luckily someone created a patch to resolve this issue. You can download the gist and unzip it to some folder. It will contain a file named play_cgywin.patch. So it's matter of applying the patch from within Git Bash. It will log an error but play will work nonetheless.
nxp10009@NXL01366 /c/development/play-2.0.4
$ patch -p1 < c:/tmp/play_cygwin.patch
patching file `framework/build'
Hunk #1 FAILED at 8.
1 out of 1 hunk FAILED -- saving rejects to framework/build.rej
patching file `play'


nxp10009@NXL01366 /c/workspaces/scala
$ play
Getting play console_2.9.1 2.0.4 ...
:: retrieving :: org.scala-sbt#boot-app
        confs: [default]
        5 artifacts copied, 0 already retrieved (3667kB/83ms)
       _            _
 _ __ | | __ _ _  _| |
| '_ \| |/ _' | || |_|
|  __/|_|\____|\__ (_)
|_|            |__/

play! 2.0.4, http://www.playframework.org

Thursday, December 20, 2012

StringJoining lines of file Unix

This week I had to generate a batch (400) of DITA files. The actual ids were handed to me in an excel sheet. As I was using a scripting language (Cocoon flowscript == Javascript) it would be convenient to transform the id column from excel to Javascript array notation. So the first thing I did was copying the content of the first column and paste it into batch2.txt.

batch2.txt
2N7000
BLF1822-10
BT150-800R
BU1506DX
BUT18A
BUX87P
BY229-600

So now I needed to find a way to basically transform each line by wrapping each id in double quotes and next do a string-join of all lines. As I was a bit rusty in shell scripting I started looking online. My colleague and I stumbled upon the same approach which got the job done for 99.9%. The only problem was that there was a comma after the last id.
for line in `cat batch2.txt`; do echo -n "\"$line\"," ; done >> output.txt

"2N7000","BLF1822-10","BT150-800R","BU1506DX","BUT18A","BUX87P","BY229-600",

When i woke up this morning I felt restless... surely doing a simple stringjoin can't be that difficult? So I started reading about a few unix commands and you can use 'sed' to easily do string replacement. So the trick I use is to first wrap all ids within double quotes. I also have to make sure to use the -n flag with the echo command so the newlines are removed. Next I just replace 2 sequential double quotes "" by "," and that's it.
nxp10009@NXL01366 /c/tmp
$ for line in `cat batch2.txt`; do echo -n "\"$line\""; done | sed 's/""/","/g' >> output.txt

nxp10009@NXL01366 /c/tmp
$ less output.txt
"2N7000","BLF1822-10","BT150-800R","BU1506DX","BUT18A","BUX87P","BY229-600"

Wednesday, December 19, 2012

Using Options in XQuery

As a developer you try to use best practices from different toolkits, frameworks, programming languages. One of the things Scala solves nicely in their core libraries is avoiding nullpointer exceptions. So I decided to hack this idea into some XQuery functions for fun.
   <!-- Taking  the same approach as Scala we return options which are either none or some -->
   <option>
     <none/>
   </option>
   
   <option>
     <some>4</some>
   </option> 

   <!-- Example of a lookup map represented in XML -->
   <map>
     <entry key="PH3330L">SOT669</entry>
     <entry key="BUK100-50GL">SOT78B</entry>
     <entry key="PSMN003-30B">SOT404</entry>
   </map>   

An example of how you could model options in XQuery and build some utility functions to make your code more safe.
declare function local:getOrElse($map as element(map), $key as xs:string, $else)  {
    let $option := local:get($map, $key)
    return (if (empty($option/some)) then $else else data($option/some))
};

declare function local:get($map as element(map), $key as xs:string) as element(option) {
    (if (empty($map/entry[@key=$key])) then <option><none/></option> else <option><some>{data($map/entry[@key=$key])}</some></option>)
};

let $map1 := 
   <map>
     <entry key="PH3330L">SOT669</entry>
     <entry key="BUK100-50GL">SOT78B</entry>
     <entry key="PSMN003-30B">SOT404</entry>
   </map> 
 
let $map2 :=    
   <map>
     <entry key="robby">1977</entry>
     <entry key="ivan">1987</entry>
   </map>  
 
return 
  <lookups>
    <package>{local:getOrElse($map1, "PH3330L", "test1")}</package>
    <package>{local:getOrElse($map1, "INVALID", "test1")}</package>
    <age>{2012 - local:getOrElse($map2, "robby", 0)}</age>
    <age>{2012 - local:getOrElse($map2, "amanda", 2012)}</age>
  </lookups>  


The result looks like this if you test it on e.g. Zorba.
<lookups>
  <package>SOT669</package>
  <package>test1</package>
  <age>35</age>
  <age>0</age>
</lookups>

Handling Nil values in XSLT / XQuery

Today I had to generate another batch of DITA files but I discovered that some failed. So the problem was that some dates were nillable and I was formatting the dates so I ran into an exception. So you can in fact pretty easily check if that element has a nil value.

<xsl:variable name="now" select="current-date()"/>
<xsl:variable name="date-format" select="'[Y0001]-[M01]-[D01]T00:00:00'"/>  

<created date="{if (not($productInfo/InitialWebPublicationDate/@xsi:nil='true')) then format-date($productInfo/InitialWebPublicationDate, $date-format) else (format-date($now, $date-format))}"/>


As Ryan Dew commented there is a shorter way like below:
<created date="{if (not(nilled($productInfo/InitialWebPublicationDate))) then format-date($productInfo/InitialWebPublicationDate, $date-format) else (format-date($now, $date-format))}"/>

Monday, December 17, 2012

Using sbt plugin for creating IntelliJ IDEA project files (update)

This article contains up-to-date information. I followed the tutorial from the Guardian but there were some issues and I managed to resolve everything. So first we will install the sbt-idea plugin. This time I will install the plugin not on individual project basis but for our SBT installation directly.

First I had to create a plugins folder in my %USER_HOME%/.sbt/ which in below snippet has already been taken care of. Next I created a build.sbt and I added the mpeltonen plugin.
$ pwd
/c/Users/nxp10009/.sbt/plugins

nxp10009@NXL01366 ~/.sbt/plugins
$ ls -la
total 3
drwxr-xr-x    5 nxp10009 Administ     4096 Dec 17 14:14 .
drwxr-xr-x    1 nxp10009 Administ        0 Dec 17 14:13 ..
-rw-r--r--    1 nxp10009 Administ       61 Dec 17 14:14 build.sbt

$ less build.sbt
addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.1.0")


So next I created a new project according to my previous article and tried to check if everything worked ok. But I soon ran into following issue:
[warn] Host repo.typesafe.com not found. url=http://repo.typesafe.com/typesafe/ivy-releases/com.github.mpeltonen/sbt-idea/scala_2.9.2/sbt_0.12/1.1.0-M2-TYPESAFE/ivys
/ivy.xml
[info] You probably access the destination server through a proxy server that is not well configured.

So I had to configure a proxy for SBT to use. I could have created an alias in my bashrc script but I decided to change the SBT start script and add the proxy to the JAVA_OPTS
nxp10009@NXL01366 /c/development/sbt
$ less sbt
#!/bin/sh
# sbt launcher script for Cygwin and MSYS

JAVA_CMD=java
JAVA_OPTS="-Dhttp.proxyHost=http://your_proxy -Dhttp.proxyPort=8080 -Dhttps.proxyHost=http://your_proxy -Dhttps.proxyPort=8080 -Xmx512M"

....

Sunday, December 9, 2012

Using sbt plugin for creating IntelliJ IDEA project files

Of course I'm a lazy developer -- who not -- ? So I started searching for a sbt plugin to generate the IntelliJ IDEA project files. I found two which seem to work pretty well.


So I decided to go with the mpeltonen plugin. You have two choices. You can configure the plugin for sbt globally or on a per project basis. I will now configure the plugin for the demo project only.
nxp10009@NXL01366 /c/workspaces/pelssers/demo/project
$ vi plugins.sbt

resolvers += "Sonatype snapshots" at "http://oss.sonatype.org/content/repositories/snapshots/"

addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.2.0-SNAPSHOT")

If you start sbt again now for the demo project you will see it will fetch the dependencies.
nxp10009@NXL01366 /c/workspaces/pelssers/demo
$ sbt
[info] Loading project definition from C:\workspaces\pelssers\demo\project
[info] Updating {file:/C:/workspaces/pelssers/demo/project/}default-a9f0a5...
[info] Resolving com.github.mpeltonen#sbt-idea;1.2.0-SNAPSHOT ...
[info] Resolving commons-io#commons-io;2.0.1 ...
[info] Resolving org.scala-sbt#sbt;0.12.0 ...

Now we can use the gen-idea command to generate our project files
> gen-idea
[info] Trying to create an Idea module demo
[info] Resolving org.scala-lang#scala-library;2.9.2 ...
[info] downloading http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.9.2/scala-library-2.9.2-sources.jar ...
[info]  [SUCCESSFUL ] org.scala-lang#scala-library;2.9.2!scala-library.jar(src) (887ms)
[info] downloading http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.9.2/scala-library-2.9.2-javadoc.jar ...
[info]  [SUCCESSFUL ] org.scala-lang#scala-library;2.9.2!scala-library.jar(doc) (5127ms)
[info] Excluding folder target
[info] Created C:\workspaces\pelssers\demo/.idea/IdeaProject.iml
[info] Created C:\workspaces\pelssers\demo\.idea
[info] Excluding folder C:\workspaces\pelssers\demo\target
[info] Created C:\workspaces\pelssers\demo\.idea_modules/demo.iml
[info] Created C:\workspaces\pelssers\demo\.idea_modules/demo-build.iml
>
And my colleague Ivan pointed me to this link from the guardian which covers the topic into more detail.

SBT Quick start - Part 1

Now that I have Scala and SBT working from GitBash we will setup our first demo project.
$ mkdir demo
$ cd demo
$ mkdir -p src/{main,test}/{java,scala,resources}

Create a script hw.scala and place it under demo/src/scala
/*********** hw.scala ***************/
object Hi {
  def main(args: Array[String]) = println("Demo application says HELLO!")
}

now run sbt -> Running sbt with no command line arguments starts it in interactive mode.
nxp10009@NXL01366 /c/workspaces/pelssers/demo
$ sbt
[info] Set current project to default-8388aa (in build file:/C:/workspaces/pelssers/demo/)
> exit  

We still have to create a build.sbt in the project root so let's do this first
name := "demo"

version := "1.0-SNAPSHOT"

scalaVersion := "2.9.2"

If we now rerun sbt again everything seems ok
$ sbt
[info] Set current project to demo (in build file:/C:/workspaces/pelssers/demo/)
> exit

You will now see sbt created a project folder for you. Here we will create a new build.properties file
nxp10009@NXL01366 /c/workspaces/pelssers/demo
$ ls -la
total 3
drwxr-xr-x    6 nxp10009 Administ        0 Dec  9 15:13 .
drwxr-xr-x    8 nxp10009 Administ     4096 Dec  9 15:04 ..
-rw-r--r--    1 nxp10009 Administ       70 Dec  9 15:04 build.sbt
drwxr-xr-x    3 nxp10009 Administ        0 Dec  9 15:13 project
drwxr-xr-x    4 nxp10009 Administ        0 Dec  9 15:05 src
drwxr-xr-x    3 nxp10009 Administ        0 Dec  9 15:12 target


$ cd project
$ vi build.properties


sbt.version=0.12.0
Enabling continous build and test
> ~ compile
[success] Total time: 0 s, completed Dec 9, 2012 3:32:39 PM
1. Waiting for source changes... (press enter to interrupt)

Now edit the hw.scala and make a little change:
object Hi {
  def main(args: Array[String]) = println("Demo application says HELLO again!")
}

[info] Compiling 1 Scala source to C:\workspaces\pelssers\demo\target\scala-2.9.2\classes...
[success] Total time: 1 s, completed Dec 9, 2012 3:33:02 PM
2. Waiting for source changes... (press enter to interrupt)

Getting Git Bash and Scala working on windows

I found a great blog post about fixing issues. I already avoid using spaces in directory names so I only ran into the last issue:
nxp10009@NXL01366 /c/development/scala/scala-2.9.2/bin
$ scala
Exception in thread "main" java.lang.NoClassDefFoundError: scala/tools/nsc/MainGenericRunner
Caused by: java.lang.ClassNotFoundException: scala.tools.nsc.MainGenericRunner
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: scala.tools.nsc.MainGenericRunner.  Program will exit.

So I only need to create a bash script and add an alias to fix the issue
nxp10009@NXL01366 ~
$ cd ~

nxp10009@NXL01366 ~
$ pwd
/c/Users/nxp10009

nxp10009@NXL01366 ~
$ vi .bashrc

alias scala='scala -nobootcp'

Now exit Git Bash and open a new shell:
nxp10009@NXL01366 ~
$ scala
Welcome to Scala version 2.9.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_32).
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Thursday, December 6, 2012

Merging DITA maps and topics

This week I did a major data conversion. For about 2k products we generated DITA maps (each pointing to 3 topics). But many products had the same data so the generated topics had the same <body> tag. So we decided to first merge all topics. This also meant we had to rewrite the topicrefs in the maps. And next we could also merge the maps themselves if they had the same topicrefs.

One important lesson learned.. I first used timestamps for the merged files. It seemed like Saxon was able to merge 4 use cases in 1 millisecond so they ended up overwriting each other. So I quickly had to look for another alternative and switched to using the hashcode of the grouping-keys.

Example map:
<?xml version="1.0" encoding="utf-8"?>
<value-proposition id="vp_BC51-10PA" rev="001.001" title="Value proposition" xml:lang="en-US">
  <topicmeta translate="no">
    <subtitle translate="yes">45 V, 1 A PNP medium power transistor</subtitle>
    <prodinfo><prodname>BC51-10PA</prodname></prodinfo>
  </topicmeta>
  <technical-summary-ref href="technical-summary/ts_BC51-10PA.dita"/>
  <features-benefits-ref href="features-benefits/fb_BC51-10PA.dita"/>
  <target-applications-ref href="target-applications/ta_BC51-10PA.dita"/>
</value-proposition>

Example topic
<?xml version="1.0" encoding="utf-8"?>
<p-topic id="fb_BC51-10PA" rev="001.001" xml:lang="en-US">
  <title translate="no">Features and benefits</title>
  <prolog translate="no">...</prolog>
  <body>
    <ul>
      <li><p>High current</p></li>
      <li><p>Three current gain selections</p></li>
      <li><p>High power dissipation capability</p></li>
      <li><p>Exposed heatsink for excellent thermal and electrical conductivity</p></li>
      <li><p>Leadless very small SMD plastic package with medium power capability</p></li>
      <li><p>AEC-Q101 qualified</p></li>
    </ul>
  </body>
</p-topic>

I just am going to share the XSLT's that did the hard work to merge the topics and maps. I'm sure I can reuse the same approach in the future.
topicmerge.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
This stylesheet will merge topics if they have the same body tag
-->

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:param name="exportFolder"/>
  <xsl:param name="subFolder"/>
  <xsl:variable name="folderTemplate" select="concat('file:///', $exportFolder, $subFolder, '/topic-type/?select=*.dita')"/>

  <xsl:variable name="featuresandbenefits" select="collection(replace($folderTemplate, 'topic-type', 'features-benefits'))"/>
  <xsl:variable name="technicalsummaries" select="collection(replace($folderTemplate, 'topic-type', 'technical-summary'))"/>
  <xsl:variable name="targetapplications" select="collection(replace($folderTemplate, 'topic-type', 'target-applications'))"/>

  <xsl:variable name="date-format" select="'[Y0001]-[M01]-[D01]T[h01]:[m01]:[s01]'"/>

  <xsl:function name="nxp:getHashCode">
    <xsl:param name="stringvalue" as="xs:string"/>
    <xsl:value-of select="string:hashCode($stringvalue)" xmlns:string="java:java.lang.String"/>
  </xsl:function>

  <!-- handles a logical group of documents (featuresandbenefits | technicalsummaries | targetapplications) -->
  <xsl:template name="mergeDocumentGroup">
    <xsl:param name="documents"/>
    <xsl:for-each-group select="$documents" group-by="p-topic/body">
      <xsl:call-template name="p-topic">
        <xsl:with-param name="topics" select="current-group()/p-topic"/>
        <xsl:with-param name="grouping_key"  select="current-grouping-key()"/>
      </xsl:call-template>
    </xsl:for-each-group>
  </xsl:template>

  <xsl:template match="/">
    <result>
      <xsl:call-template name="mergeDocumentGroup">
        <xsl:with-param name="documents" select="$featuresandbenefits"/>
      </xsl:call-template>
      <xsl:call-template name="mergeDocumentGroup">
        <xsl:with-param name="documents" select="$technicalsummaries"/>
      </xsl:call-template>
      <xsl:call-template name="mergeDocumentGroup">
        <xsl:with-param name="documents" select="$targetapplications"/>
      </xsl:call-template>
    </result>
  </xsl:template>


  <xsl:template name="p-topic">
    <xsl:param name="topics"/>
    <xsl:param name="grouping_key"/>
    <xsl:variable name="topic" select="$topics[1]"/>
    <p-topic>
      <xsl:choose>
        <xsl:when test="count($topics) > 1">
          <xsl:apply-templates select="$topic/@* | $topic/node()" mode="merge">
            <xsl:with-param name="grouping_key" select="$grouping_key" tunnel="yes"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="$topic/@* | $topic/node()"/>
        </xsl:otherwise>
      </xsl:choose>
      <!-- we temporarily add the original topic id's so we can easily alter the topicrefs in a subsequent transform -->
      <topics>
        <xsl:for-each select="$topics">
          <id><xsl:value-of select="./@id"/></id>
        </xsl:for-each>
      </topics>
    </p-topic>
  </xsl:template>

  <xsl:template match="p-topic/@id" mode="merge">
    <xsl:param name="grouping_key" tunnel="yes"/>
    <xsl:attribute name="id"
        select="concat(substring-before(., '_'), '_', translate(nxp:getHashCode($grouping_key), '-', ''))"/>
  </xsl:template>

    <!-- copy all nodes and attributes which are not processed by one of available templates -->
  <xsl:template match="@* | node()">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | node()" mode="merge">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*" mode="merge"/>
      <xsl:apply-templates mode="merge"/>
    </xsl:copy>
  </xsl:template>


</xsl:stylesheet>

mapmerge.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
This stylesheet will merge maps which have same topic refs and same title.
-->
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:variable name="date-format" select="'[Y0001]-[M01]-[D01]T[h01]:[m01]:[s01]'"/>

  <xsl:function name="nxp:getHashCode">
    <xsl:param name="stringvalue" as="xs:string"/>
    <xsl:value-of select="string:hashCode($stringvalue)" xmlns:string="java:java.lang.String"/>
  </xsl:function>

  <xsl:function name="nxp:getMapGroupingKey" as="xs:string">
    <xsl:param name="vp" as="element(value-proposition)"/>
    <xsl:sequence select="concat($vp/topicmeta/subtitle, $vp/technical-summary-ref/@href,
      $vp/features-benefits-ref/@href, $vp/target-applications-ref/@href)"/>
  </xsl:function>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="result">
    <result>
      <xsl:apply-templates select="p-topic"/>
      <xsl:for-each-group select="value-proposition" group-by="nxp:getMapGroupingKey(.)">
        <xsl:call-template name="value-proposition">
          <xsl:with-param name="valuepropositions" select="current-group()"/>
          <xsl:with-param name="grouping_key"  select="current-grouping-key()"/>
        </xsl:call-template>
      </xsl:for-each-group>
    </result>
  </xsl:template>

  <xsl:template name="value-proposition">
    <xsl:param name="valuepropositions"/>
    <xsl:param name="grouping_key"/>
    <xsl:variable name="vp" select="$valuepropositions[1]"/>
    <value-proposition>
      <xsl:choose>
        <xsl:when test="count($valuepropositions) > 1">
          <xsl:apply-templates select="$vp/@* | $vp/node()" mode="merge">
            <xsl:with-param name="valuepropositions" select="$valuepropositions" tunnel="yes"/>
            <xsl:with-param name="grouping_key" select="$grouping_key" tunnel="yes"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="$vp/@* | $vp/node()"/>
        </xsl:otherwise>
      </xsl:choose>
    </value-proposition>
  </xsl:template>


  <xsl:template match="value-proposition/@id" mode="merge">
    <xsl:param name="grouping_key" tunnel="yes"/>
    <xsl:attribute name="id"
         select="concat(substring-before(., '_'), '_', translate(nxp:getHashCode($grouping_key), '-', ''))"/>
  </xsl:template>

  <!-- copy all nodes and attributes which are not processed by one of available templates -->
  <xsl:template match="@* | node()">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | node()" mode="merge">
    <xsl:copy copy-namespaces="no">
      <xsl:apply-templates select="@*" mode="merge"/>
      <xsl:apply-templates mode="merge"/>
    </xsl:copy>
  </xsl:template>


</xsl:stylesheet>

Wednesday, October 31, 2012

Using keys with XSLT2.0

This article will show you how to efficiently use keys to speed up XSLT processing if you are dealing with large input files (hundreds of megabytes). Consider following example where we have two input files (stock.xml and orderlines.xml). The idea is to update the stock with new quantities by processing the orderlines.

The challenge here is how to use a key (built from matching orderlines) in the context of processing the stock. It might sound trivial but I leave it up to yourself to find out it's actually not.

stock.xml
<?xml version="1.0" encoding="UTF-8" ?>
<stock>
  <item id="PH3330L">
    <quantity>10</quantity>
  </item>
  <item id="BAS16">
    <quantity>7</quantity>
  </item>
  <item id="BUK100-50DL">
    <quantity>14</quantity>
  </item>  
</stock>

orderlines.xml
<?xml version="1.0" encoding="UTF-8" ?>
<orderlines>
  <orderline itemId="PH3330L">
    <quantity>4</quantity>
  </orderline>
  <orderline itemId="BAS16">
    <quantity>2</quantity>
  </orderline> 
</orderlines>

newstock.xml (expected output)
<?xml version="1.0" encoding="UTF-8"?>
<stock>
  <item id="PH3330L">
    <quantity>6</quantity>
  </item>
  <item id="BAS16">
    <quantity>5</quantity>
  </item>
  <item id="BUK100-50DL">
    <quantity>14</quantity>
  </item>  
</stock>

processOrderlines.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
-->

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  exclude-result-prefixes="pelssers xs">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  
  <xsl:param name="orderlinesURI" />
  <xsl:variable name="orderlines" select="document($orderlinesURI)/orderlines"/>
  <xsl:key name="orderline-lookup" match="orderline" use="@itemId"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:function name="pelssers:newQuantity" as="xs:double">
    <xsl:param name="element" as="element(orderlines)"/>
    <xsl:param name="itemId" as="xs:string"/>
    <xsl:param name="stockQuantity" as="xs:double"/>
    <xsl:apply-templates select="$element">
      <xsl:with-param name="itemId" select="$itemId"/>
      <xsl:with-param name="stockQuantity" select="$stockQuantity"/>
    </xsl:apply-templates>
  </xsl:function>

  <xsl:template match="orderlines" as="xs:double">
    <xsl:param name="itemId" as="xs:string"/>
    <xsl:param name="stockQuantity" as="xs:double"/>    
    <xsl:sequence select="if (exists(key('orderline-lookup', $itemId))) 
                  then $stockQuantity - key('orderline-lookup', $itemId)/quantity else $stockQuantity"/>
  </xsl:template>

  <xsl:template match="stock/item/quantity">
    <quantity><xsl:sequence select="pelssers:newQuantity($orderlines, parent::item/@id, .)"/></quantity>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

For this demo I only used the saxon jar from the command line.
java -Xmx1024m -jar Saxon-HE-9.4.jar 
  -s:C:/tmp/keydemo/input/stock.xml 
  -o:C:/tmp/keydemo/output/newstock.xml 
  -xsl:C:/tmp/keydemo/xslt/processOrderlines.xslt orderlinesURI=file:/C:/tmp/keydemo/input/orderlines.xml

Below a simplified stylesheet using a 3rd parameter to set the context node. It's based on a tip from @grtjn.
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
-->

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  exclude-result-prefixes="pelssers xs">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  
  <xsl:param name="orderlinesURI" />
  <xsl:variable name="orderlines" select="document($orderlinesURI)/orderlines"/>
  <xsl:key name="orderline-lookup" match="orderline" use="@itemId"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="stock/item/quantity">
    <xsl:variable name="orderline" select="key('orderline-lookup', parent::item/@id, $orderlines)"/>
    <quantity><xsl:sequence select="if (exists($orderline)) then . - $orderline/quantity else xs:double(.)"/></quantity>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Friday, October 26, 2012

Indenting your XSLT output

Ever wondered why the output from your XSLT is not indented even if you use @indent="yes"?
You will need to use extension functions and below are examples how to do it for Xalan and Saxon. Remark: For this to work with Saxon you will need the professional edition.

Saxon:
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:saxon="http://saxon.sf.net/"
                extension-element-prefixes="saxon">

    <xsl:output method="xml" saxon:indent-spaces="4" indent="yes"/>


Xalan:
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xslt="http://xml.apache.org/xslt">

    <xsl:output method="xml" xslt:indent-amount="4" indent="yes"/>


Friday, October 19, 2012

Creating UNIX timestamp with XSLT2.0 (Saxon)

Creating timestamps is a quite often used requirement. If you start googling for how to create one in XSLT, you find exotic solutions. Today I set out to find an elegant one using XSLT extension functions.
If you take a look at the Java API, and in particular java.util.Date, you will see a method getTime() which returns exactly what I need.
long getTime()
Returns the number of milliseconds since January 1, 1970, 00:00:00 GMT represented by this Date object.

Now let's see at a simple input XML containing products. For each product we want to generate a timestamp while processing each product node.
<products>
  <product>
    This is a complex node
  </product>
  <product>
    This is a complex node
  </product>  
</products>

To understand how extension functions with Saxon can be used, take a look here. In this case we really need to construct new Date objects and invoke the method getTime on them. We bind the prefix date to the namespace java:java.util.Date. Next we can construct a new date object with date:new(). To invoke a method on any object you actually have to pass the context object to that method. So date:getTime(date:new()) is actually the java equivalent for new java.util.Date().getTime()
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:function name="nxp:getTimestamp">
    <xsl:value-of select="date:getTime(date:new())"  xmlns:date="java:java.util.Date"/>
  </xsl:function>

  <xsl:template match="product">
   <product processedTimestamp="{nxp:getTimestamp()}">
     <xsl:apply-templates/>
   </product>
  </xsl:template>

</xsl:stylesheet>

So when you execute that stylesheet you will end up with product tags having a new attribute like below:
<product processedTimestamp="1350635976117">
 ...
</product>

Wednesday, October 17, 2012

Cocoon flowscript gotcha

Today I got a pretty hard to debug issue on my plate. To give more insight into the complexity let me explain the chain of events happening:
  • client side javascript doing a POST to a cocoon pipeline
  • flowscript calling another pipeline to fetch JSON from XMLDB
  • send a response to the zipserializer 
So there is a lot going on in those 3 simple events.  The easy part is we post some values from a form to a server side action.

POST parameters:
id=PH3030AL

But the id PH303AL is the identifier of a chemical content XML file. What we really need is the identifiers of the 1-to-many relationship with salesitems. We can retrieve those by executing an xquery.  We wrote a custom XQuery Generator that takes any request parameters and injects them dynamically into the XQuery as parameters.

So let's take a look at an action mapped to an XQuery:

<map:match pattern="xquery/getSalesItems">
  <map:generate src="xquery/chemicalcontent/getSalesItems.xquery" type="queryStringXquery"/>
  <map:serialize type="xml"/>
</map:match>

The only thing needed is mapping some match pattern to the xquery and the XQuery generator will execute the xquery after injecting any requests parameters provided.

So far so good.  So if we now invoke following URL:
http://localhost:8888/search/xquery/getSalesItems?id=PH3030AL
we get back following JSON response

[{"id": "PH3030AL","nc12s": ["934063085115"]}]

But from flowscript we needed to call this pipeline and this is were the problem occurred.

var output = new Packages.java.io.ByteArrayOutputStream();
var requestString = new Collection(cocoon.request.getParameterValues("id")).stringJoin(function(id) {return "id=" + id}, "&");
var uri = "xquery/getSalesItems?" + requestString;
cocoon.processPipelineTo(uri, null, output);

When we printed the output to the console we got following JSON output:
[{"id": "PH3030AL","nc12s": ["934063085115"]},{"id": "PH3030AL","nc12s": ["934063085115"]}]
To keep a long story short. We should not add the request string another time for invoking another pipeline in the same request as the POST parameters were already present and we ended up with duplicate request parameters. Changing the URI to the one below fixed our bug.
var uri = "xquery/getSalesItems";
 

Tuesday, October 16, 2012

Using SVG in modern browsers

SVG has come a long way and I can still clearly remember the days one had to install Adobe flash to enable SVG in the browser. I was pretty excited about this technology as it enabled me to dynamically generate images from the domain model. Currently you can even use the html5 canvas tag. But back to SVG. We switched from using .eps files not so long ago to SVG. You can't render EPS in the browser so that's one big drawback. And the good news is recent browsers (IE9, chrome, firefox, ..) natively support SVG. Forget about using the old school <object> and <data> tags to embed SVG ;-)
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />     
    <title>SVG test </title>
  </head>
  <body>
    <div>
      <div>embedding SVG with img tag</div>
      <div><img src="test.svg"/></div>
    <div>
    <div>
      <div>embedding inline SVG</div>
      <div>
        <svg  xmlns="http://www.w3.org/2000/svg" xml:space="preserve" width="40mm" height="28.63mm" viewBox="0 0 90.00 81.14">
          <rect x="1" y="1" width="60" height="60" fill="none" stroke="red" stroke-width="2"/>
          <circle cx="40" cy="44" r="30" fill="none" stroke="yellow" stroke-width="10"  />
          <text x="24" y="48">SVG</text>
        </svg>
      </div> 
  </body>    
</html>

test.svg
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" width="40mm" height="28.63mm" viewBox="0 0 90.00 81.14">
  <rect x="1" y="1" width="60" height="60" fill="none" stroke="purple" stroke-width="2"/>
  <circle cx="40" cy="44" r="30" fill="red" stroke="blue" stroke-width="10"  />
  <text x="24" y="48">SVG</text>
</svg>

The result looks like presented below

Friday, October 5, 2012

Using Play JSON library to expose JSON services

Currently I am experimenting with Playframework. Today we will see how easy it is to generate a JSON response. Let us first take a look at our domain model
package models

case class Contact(firstName: String, lastName: String, age: Int)

object Contact {
  val contacts = Set(
      Contact("Robby", "Pelssers", 35),
      Contact("Davy", "Pelssers", 35),
      Contact("Lindsey", "Pelssers", 9)
  )
  
  def findAll = this.contacts.toList.sortBy(_.firstName)

}
Next let us take a look at a simple controller returning contacts in JSON format
package controllers

import play.api.mvc.{ Action, Controller }
import play.api.libs.json.Json._
import models.Contact

object Contacts extends Controller {

  def toJSON = Action { implicit request =>
    val contacts = Contact.findAll   
    val json = 
      toJson(
        contacts.map(
            contact => toJson(
                Map("firstName" -> toJson(contact.firstName), 
                    "lastName" -> toJson(contact.lastName), 
                    "age" -> toJson(contact.age))
            )    
        )
      )
     
    Ok(json)
  }
  
}

Now it's a matter of mapping a URL to our controller method.
GET     /contacts.json              controllers.Contacts.toJSON
Now let's see if our response looks ok by using curl
$ curl --request GET --include http://localhost:9000/contacts.json
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 160

[{"firstName":"Davy","lastName":"Pelssers","age":35},{"firstName":"Lindsey","lastName":"Pelssers","age":9},{"firstName":"Robby","lastName":"Pelssers","age":35}]

Wednesday, October 3, 2012

java.io.FileNotFoundException: Too many open files

This morning we noticed that our (S)FTP service was not sending files anymore. We monitor this process however and any error logmessages are stored per job. A few error logmessages were very deceiving however
File[] files = jobIdDirectory.listFiles();
if (null == files || files.length != 1) {
  throw new SharedFileValidationException(String.format(
  "The directory %s is expected to contain a single file inside, actual number of files is %d", jobIdDirectory.getAbsolutePath(), files != null ? files.length : -1));
}

Checking the logs for a job that failed:
2012-10-03 08:40:26,498 INFO  pool-35 com.nxp.spider2.ftpservice.service.impl.FileTransferPostProcessor - <<< Updating job 74058544 status to Sent Failed [item: BUK7540-100A, workflow: SPC-2-PWS, error message: The directory /appl/wpc/5.2/pxprod1/public_html/sharedFS/74058544 is expected to contain a single file inside, actual number of files is -1]
Checking the actual filesystem shows however that this is not the case
$/appl/../public_html/sharedFS/74058544/TRIMM_PRODUCTS>ls
BUK7540-100A.zip

But luckily we found another logmessage which gave us more insight
Caused by: java.io.FileNotFoundException: /appl/../public_html/sharedFS/73526306/TRIMM_PRODUCTS/LPC1830FET256.zip (Too many open files)

So my next action was to get more info on how to resolve this kind of error and i found this useful StackOverflow question. It turned out that I could not use the lsof command.
lsof -p <pid of jvm>

But I managed to find the PID for the failing tomcat instance.
ps -ef | grep tomcat
So next I took a look at the following folder:
/proc/{pid}/fd>ls | wc -l
   1024
So it seems we hit the default sweet spot of open files. I shutdown the tomcat instance and the open file handles got closed. Actually the complete /proc/{pid} folder got deleted.
Next I restarted the tomcat instance and checked against the new pid. The number of open file handles grows and shrinks so it does not seem to be a code issue.
/proc/11395/fd>ls | wc -l
    147
/proc/11395/fd>ls | wc -l
    142
/proc/11395/fd>ls | wc -l
    142
:/proc/11395/fd>ls | wc -l
    142
/proc/11395/fd>ls | wc -l
    143
I actually just checked and we have about 9000 jobs to be processed and this big backlog could potentially result in this issue.

Thursday, September 13, 2012

Using properties from within XQuery modules

Today I ran into an interesting issue. Sedna XMLDB does not support declaring external variables. But sometimes we need to have access to properties, even from a stored xquery module. We could hardcode values but some values are environment specific so no luck there.

So I was scratching my hair today and decided to get lucky on the Sedna mailinglist. Charles Foster was kind enough to share some ideas and one idea was storing the properties in the XMLDB itself.

So I quickly hacked together some prototype properties xquery support library which offers similar functionality to the Cocoon-Spring-Configurator. We can specify generic properties on the highest level and also specify environment specific properties. The nice thing is we can actually always pas the environment (prod / test/ dev) and if it can't find the property it will default to searching for a generic property by that name.
<?xml version="1.0" encoding="UTF-8"?>
<properties id="test-suite">
    <!-- default properties -->
    <property id="string1">this is text</property>
    <property id="boolean_false">false</property>
    <property id="boolean_true">true</property>
    <property id="int">5</property>
    <property id="double">3.56</property>
    <property id="decimal">6.23</property>
    <property id="float">002002.270</property>
    <property id="time">12:20:46.275+01:00</property>
    <property id="dateTime">2002-12-07T12:20:46.275+01:00</property>
    <property id="date">2002-12-07</property>
    <property id="duration">P30Y243D</property>
    <property id="anyURI">http://www.google.com</property>
    <environment id="prod">
        <property id="base_uri">http://nww.prod.spider.nxp.com</property>
        <property id="port">8513</property>
    </environment>
</properties>

Properties XQuery library:
module namespace properties = "http://www.nxp.com/properties";

declare function properties:getPropertyFiles() as element(properties)* {
    collection("properties")/properties
};

declare function properties:getPropertyFile($id as xs:string) as element(properties)? {
    properties:getPropertyFiles()[@id=$id]
};

(:  properties:getProperty("test-suite.string1") :)
declare function properties:getProperty($expr as xs:string) {
   properties:getProperty($expr, ())
};

(: properties:getProperty("test-suite.base_uri", "test") :)
declare function properties:getProperty($expr as xs:string, $env as xs:string?) {
    let $tokens := tokenize($expr, "\.")
    let $fileId := $tokens[1]
    let $propertyId := $tokens[2]
    let $property := if (exists($env) and exists(properties:getPropertyFile($fileId)/environment[@id=$env]/property[@id=$propertyId]))
       then properties:getPropertyFile($fileId)/environment[@id=$env]/property[@id=$propertyId]
       else properties:getPropertyFile($fileId)/property[@id=$propertyId]
    return  if (exists($property)) then data($property)
            else fn:error(fn:QName('http://www.nxp.com/error', 'properties:doesNotExist'), concat('Property ', $expr, ' does not exist'))
};

import module namespace properties = "http://www.nxp.com/properties";

<test-suite>
  <test>test-suite.base_uri      = {properties:getProperty("test-suite.base_uri", "prod")}</test>
  <test>test-suite.port          = {properties:getProperty("test-suite.port", "prod")}</test>  
  <test>test-suite.string1       = {properties:getProperty("test-suite.string1", "prod")}</test>
  <test>test-suite.boolean_false = {properties:getProperty("test-suite.boolean_false")}</test>
  <test>test-suite.boolean_true  = {properties:getProperty("test-suite.boolean_true")}</test>
  <test>test-suite.int           = {properties:getProperty("test-suite.int")}</test>
  <test>test-suite.double        = {properties:getProperty("test-suite.double")}</test>
  <test>test-suite.decimal       = {properties:getProperty("test-suite.decimal")}</test>
  <test>test-suite.float         = {properties:getProperty("test-suite.float")}</test> 
  <test>test-suite.time          = {properties:getProperty("test-suite.time")}</test>
  <test>test-suite.date          = {properties:getProperty("test-suite.date")}</test> 
  <test>test-suite.dateTime      = {properties:getProperty("test-suite.dateTime")}</test>
  <test>minutes from test-suite.dateTime = {minutes-from-dateTime(properties:getProperty("test-suite.dateTime"))}</test>
  <test>test-suite.anyURI        = {properties:getProperty("test-suite.anyURI")}</test>
  <test>test-suite.duration      = {properties:getProperty("test-suite.duration")}</test>
  <test>year from test-suite.duration      = {years-from-duration(properties:getProperty("test-suite.duration"))}</test>
</test-suite>
Output from test-suite
<test-suite>
  <test>test-suite.base_uri      = http://nww.prod.spider.nxp.com</test>
  <test>test-suite.port          = 8513</test>
  <test>test-suite.string1       = this is text</test>
  <test>test-suite.boolean_false = false</test>
  <test>test-suite.boolean_true  = true</test>
  <test>test-suite.int           = 5</test>
  <test>test-suite.double        = 3.56</test>
  <test>test-suite.decimal       = 6.23</test>
  <test>test-suite.float         = 002002.270</test>
  <test>test-suite.time          = 12:20:46.275+01:00</test>
  <test>test-suite.date          = 2002-12-07</test>
  <test>test-suite.dateTime      = 2002-12-07T12:20:46.275+01:00</test>
  <test>minutes from test-suite.dateTime = 20</test>
  <test>test-suite.anyURI        = http://www.google.com</test>
  <test>test-suite.duration      = P30Y243D</test>
  <test>year from test-suite.duration      = 30</test>
</test-suite>

Now let's try and see what happens if we access a non existing property.
import module namespace properties = "http://www.nxp.com/properties";

<test-suite>
  <test>should result in exception = {properties:getProperty("test-suite.nonexisting", "prod")}</test>
</test-suite>

2012/09/14 09:40:17 database query/update failed (SEDNA Message: ERROR doesNotExist
    Property test-suite.nonexisting does not exist
)

Now we only need to make sure that the correct environment is passed. As all our environments use a different database we only need to store a specific constants library in each database and we're good to go.
module namespace constants = "http://www.nxp.com/constants";

declare variable $constants:ENVIRONMENT as xs:string := "test";

So now we can rewrite our little test-suite to use this constant
import module namespace properties = "http://www.nxp.com/properties";
import module namespace constants = "http://www.nxp.com/constants";

<test-suite>
  <test>test-suite.boolean_false = {properties:getProperty("test-suite.boolean_false", $constants:ENVIRONMENT)}</test>
</test-suite>

Wednesday, September 12, 2012

How to easily deal with timestamp based URLs

Let me shortly describe the problem. You have to fetch data from some website which exposes the data based upon timestamped URL's. For the below use case the report is generated weekly on monday.

An example {server}:{port}/exports/classificationreport_20120924.csv

So you can't exactly hardcode the URL as a property but will need to generate it dynamically whenever a request is made to that resource. I've already used joda-time in the past so I decided to use it again.
public String getClassificationReportURL() {
   MutableDateTime now = new MutableDateTime();
   now.setDayOfWeek(DateTimeConstants.MONDAY);
   DateTimeFormatter fmt = DateTimeFormat.forPattern("yyyyMMdd");
   return getExportsBaseURL() + "classificationreport_" + fmt.print(now) + ".csv";
}

Monday, August 27, 2012

Still using XSLT1.0? Time to start using Saxon.


Folder structure:
   - input
        - jsonxml-1.xml
        - jsonxml-2.xml
        - jsonxml-3.xml
   - xslt
        - jsonxmltransformer.xslt
   - output (empty)

Below some basic usage instructions. For more details checkout the official documentation. You can download the saxon.jar from the official saxon home page or from this maven repository
java -jar Saxon-HE-9.4.jar [options] [params]

-s:filename    -- Identifies the source file or directory
-o:filename    -- Send output to named file. In the absence of this option, the results go to standard output.
                  If the source argument identifies a directory, this option is mandatory and must also identify a directory; 
                  on completion it will contain one output file for each file in the source directory
-threads:N     -- Used only when the -s option specifies a directory. Controls the number of threads used to process the files in the directory
-xsl:filename  -- Specifies the file containing the principal stylesheet module

Now let's see how easy it is to transform a single file jsonxml-1.xml and save the result to transformed-result1.xml
java -jar Saxon-HE-9.4.jar -s:C:/tmp/easytransform/input/jsonxml-1.xml -o:C:/tmp/easytransform/output/transformed-result1.xml -xsl:C:/tmp/easytransform/xslt/jsonxmltransformer.xslt
That was easy enough. But suppose we want to transform a complete directory of source files?
java -jar Saxon-HE-9.4.jar -s:C:/tmp/easytransform/input -o:C:/tmp/easytransform/output -xsl:C:/tmp/easytransform/xslt/jsonxmltransformer.xslt
This will by convention save the transformed results using the same filenames as the input files to the specified output directory.

Thursday, August 23, 2012

XML Database as source for RDF Database

We've come a long way reading and transforming XML resources from a plain filesystem to setting up an XML database and executing sophisticated cross collection xqueries. As we aim to always improve our role as information providers we are on the verge of switching to one-stop-shopping. Currently we only have a part of the masterdata stored in the XMLDB. And we already are able to
  • generate DITA maps / topics  (PDF creation,  automated translations, ...)
  • generate publications (xhtml)
  • answer data related question in real time
The main components in this architectural picture are
  • Websphere product centre  (exports productinformation as XML) (soon to change)
  • Apache Cocoon (main framework that does all of the above)
  • Sedna (XMLDB)

But to get one-stop-shopping we need a more flexible way to link data from different sources (RDBMS, XMLDB, CSV, ...)

We will automate data extraction for all information resources and transform that data into RDF so it becomes easy to link the data and offer a consistent way of quering the data (SPARQL endpoint)

Below an example of an XQuery library from which we can generate RDF from the XMLDB.
module namespace basictypes2rdf = "http://www.nxp.com/basictypes2rdf";

declare copy-namespaces preserve, inherit;

declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
declare namespace skos="http://www.w3.org/2004/02/skos/core#";
declare namespace foaf="http://xmlns.com/foaf/0.1/";
declare namespace nxp="http://purl.org/nxp/schema/v1/";

import module namespace basictypes = "http://www.nxp.com/basictypes";
import module namespace string = "http://www.nxp.com/string";
import module namespace rdfutil = "http://www.nxp.com/rdfutil";
import module namespace packages2rdf = "http://www.nxp.com/packages2rdf";


declare function basictypes2rdf:fromBasicTypesRaw($products as element(Product)*) as element(rdf:Description)* {
    for $product in $products
    let $btn := basictypes:getName($product)
    return
    <rdf:Description rdf:about="{basictypes2rdf:getURI($product)}">
      <rdf:type rdf:resource="http://purl.org/nxp/schema/v1/BasicType"/>
      <nxp:productStatusDate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">{data($product/ProductInformation/ProductStatusDate)}</nxp:productStatusDate>
      <skos:prefLabel xml:lang="en-us">{data($product/ProductInformation/Description)}</skos:prefLabel>
      <nxp:productStatus rdf:resource="http://purl.org/nxp/schema/v1/{string:toCamelCase(lower-case(data($product/ProductInformation/ProductStatus)))}"/>
      <foaf:homepage rdf:resource="http://www.nxp.com/pip/{$btn}"/>
      <nxp:typeNumber>{$btn}</nxp:typeNumber>
      {
        if (exists($product/ProductInformation/PackageID))
        then <nxp:mechanicalOutline rdf:resource="{packages2rdf:getURI(basictypes:getPackage($product))}"/>
        else ()
      }
    </rdf:Description>
};

declare function basictypes2rdf:fromBasicTypes($products as element(Product)*) as element(rdf:RDF) {
    rdfutil:wrapRDF(basictypes2rdf:fromBasicTypesRaw($products))
};

declare function basictypes2rdf:fromBasicTypeIds($ids as xs:string*) as element(rdf:RDF) {
    basictypes2rdf:fromBasicTypes(basictypes:filterBySet(basictypes:getBasicTypes(), $ids))
};

declare function basictypes2rdf:getURI($product as element(Product)) as xs:anyURI {
    rdfutil:getURI("basic_types", data($product/ProductInformation/Name))
};

(:
    Usages:
    basictypes2rdf:fromBasicTypes(basictypes:getBasicType("PH3330L"))
    basictypes2rdf:fromBasicTypes(basictypes:getBasicTypes()[ProductInformation/PIPType = 0])
    basictypes2rdf:fromBasicTypeIds(("PH3330L","PH3330CL"))
:)
Following expression will produce the output below:
import module namespace basictypes2rdf = "http://www.nxp.com/basictypes2rdf";
basictypes2rdf:fromBasicTypeIds("PH3330L")

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:nxp="http://purl.org/nxp/schema/v1/">
  <rdf:Description rdf:about="http://data.nxp.com/id/basic_types/ph3330l">
    <rdf:type rdf:resource="http://purl.org/nxp/schema/v1/BasicType"/>
    <nxp:productStatusDate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2011-10-28</nxp:productStatusDate>
    <skos:prefLabel xml:lang="en-us">N-channel TrenchMOS logic level FET</skos:prefLabel>
    <nxp:productStatus rdf:resource="http://purl.org/nxp/schema/v1/endOfLife"/>
    <foaf:homepage rdf:resource="http://www.nxp.com/pip/PH3330L"/>
    <nxp:typeNumber>PH3330L</nxp:typeNumber>
    <nxp:mechanicalOutline rdf:resource="http://data.nxp.com/id/package_outline_versions/sot669"/>
  </rdf:Description>
</rdf:RDF>
And you should validate the output just to make sure.

Using SPARQL describe queries

This post will show how to use describe queries from the stardog CLI but the query is not database specific. You can actually export the triples in different formats:
  • NTRIPLES
  • RDFXML
  • TURTLE
  • TRIG
  • TRIX
  • N3
  • NQUADS
Let's try out a simple describe query in 2 formats.

$ ./stardog query  -c http://localhost:5822/nxp -q "DESCRIBE <http://data.nxp.com/basicTypes/PH3330L>" -f RDFXML

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:owl="http://www.w3.org/2002/07/owl#"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:nxp="http://purl.org/nxp/schema/v1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="http://data.nxp.com/basicTypes/PH3330L">
  <rdf:type rdf:resource="http://purl.org/nxp/schema/v1/BasicType"/>
  <nxp:productStatusDate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2011-10-28</nxp:productStatusDate>
  <skos:prefLabel xml:lang="en-us">N-channel TrenchMOS logic level FET</skos:prefLabel>
  <nxp:productStatus rdf:resource="http://purl.org/nxp/schema/v1/endOfLife"/>
  <foaf:homepage rdf:resource="http://www.nxp.com/pip/PH3330L"/>
  <nxp:typeNumber>PH3330L</nxp:typeNumber>
  <nxp:mechanicalOutline rdf:resource="http://data.nxp.com/packageOutlineVersion/SOT669"/>
</rdf:Description>

</rdf:RDF>

$ ./stardog query  -c http://localhost:5822/nxp -q "DESCRIBE <http://data.nxp.com/basicTypes/PH3330L>" -f TURTLE

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix nxp: <http://purl.org/nxp/schema/v1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://data.nxp.com/basicTypes/PH3330L> a nxp:BasicType ;
        nxp:productStatusDate "2011-10-28"^^xsd:date ;
        skos:prefLabel "N-channel TrenchMOS logic level FET"@en-us ;
        nxp:productStatus nxp:endOfLife ;
        foaf:homepage <http://www.nxp.com/pip/PH3330L> ;
        nxp:typeNumber "PH3330L" ;
        nxp:mechanicalOutline <http://data.nxp.com/packageOutlineVersion/SOT669> .

You can also return ALL object graphs of a specific type. This will wrap all descriptions in a rdf:RDF tag.
./stardog query  -c http://localhost:5822/nxp -f RDFXML -q "
PREFIX nxp:   <http://purl.org/nxp/schema/v1/>
DESCRIBE ?s
WHERE {
  ?s a nxp:BasicType.
}
" 

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 xmlns:owl="http://www.w3.org/2002/07/owl#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
 xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
 xmlns:skos="http://www.w3.org/2004/02/skos/core#"
 xmlns:nxp="http://purl.org/nxp/schema/v1/"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="http://data.nxp.com/basicTypes/74AUP1G57GW">
 <rdf:type rdf:resource="http://purl.org/nxp/schema/v1/BasicType"/>
 <nxp:productStatusDate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2011-10-20</nxp:productStatusDate>
    ...
</rdf:Description>

<rdf:Description rdf:about="http://data.nxp.com/basicTypes/74AUP1G58GW">
 <rdf:type rdf:resource="http://purl.org/nxp/schema/v1/BasicType"/>
 <nxp:productStatusDate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2011-10-14</nxp:productStatusDate>
    ...
</rdf:Description>

</rdf:RDF>

Tuesday, August 21, 2012

Stardog command line interface

There are two CLI's available:
  • stardog-admin: admininstrative client (uses SNARL Protocol only)
  • stardog: a user's client (uses HTTP or SNARL)

$ ./stardog help
Stardog 1.0.4 command line client

Type 'help <cmd>' or '<cmd> -h/--help' to print the usage information for a specific command

Type stardog [subcommand] [args]'

Available commands:
        add
        consistency
        explain inference
        explain plan
        export
        icv convert
        icv validate
        namespace add
        namespace list
        namespace remove
        passwd
        query
        remove
        search
        status

For more information on this library, visit the home page at http://stardog.com/docs/
For information on Stardog, please visit http://stardog.com

$ ./stardog-admin help
Stardog 1.0.4 command line client

Type 'help <cmd>' or '<cmd> -h/--help' to print the usage information for a specific command

Type stardog-admin [global args] [subcommand] [args]'

The global commands are --home, --disable-security, --logfile. See docs for more info.

Available commands:
        copy
        create
        drop
        icv add
        icv drop
        icv remove
        list
        metadata get
        metadata set
        migrate
        offline
        online
        optimize
        passwd
        role add
        role drop
        role grant
        role list
        role permission
        role revoke
        server start
        server stop
        user add
        user drop
        user edit
        user grant
        user list
        user permission
        user revoke

For more information on this library, visit the home page at http://stardog.com/docs/
For information on Stardog, please visit http://stardog.com

Suppose we want to get detailed info about using the 'user list' command.
$ ./stardog-admin help user list
Usage: user list [options]

Lists all users.

Valid Options:
        [--all, -A]                    : Be verbose with user info.

        [--ask-password, -P]           : Prompt for password.

        --format, -f arg               : Format for the output [TEXT, CSV, HTML]

        [--help, -h]                   : Display usage information

        [--passwd, -p arg]             : Password

        [--server arg]                 : URL of Stardog Server. If this option isn't specified, it will be read from JVM argument 'stardog.default.cli.server'. If the JVM arg isn't set, the default value 'snarl://localhost:5820' is used. If server URL
has no explicit port value, the default port value '5820' is used.

        [--username, -u arg]           : User name

Stardog Quick-start notes (Windows)

  • Laptop: Windows 7, 64 bit, intel Core i5-2520M CPU @2.5Ghz, 8GB RAM, 120 GB SSD
  • set STARDOG_HOME variable to e.g. c:/development/stardog-1.0.4 
  • copy the license file over to STARDOG_HOME
  • start the server
$ ./stardog-admin server start
   ************************************************************
   Stardog server 1.0.4 started on Tue Aug 21 11:02:34 CEST 2012.

   SNARL server running on snarl://localhost:5820/
   HTTP server running on http://localhost:5822/.
   Stardog documentation accessible at http://localhost:5822/docs
   SNARL & HTTP servers listening on all interfaces

   STARDOG_HOME=C:\development\stardog-1.0.4
   ************************************************************
  • create a database from input file
  $ ./stardog-admin create -n nxp  -t D -u admin -p admin --server snarl://localhost:5820/ c:/testdata/products.rdf
Bulk loading data to new database.
Data load complete. Loaded 340,819 triples in 00:00:04 @ 69.4K triples/sec.
Successfully created database 'nxp'.  
  • query the database:
  
$ ./stardog query -c http://localhost:5822/nxp -q "
  PREFIX nxp:   <http://purl.org/nxp/schema/v1/>
  PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
  SELECT ?typeNumber ?prefLabel
  WHERE
  {
    ?x nxp:typeNumber ?typeNumber;
       skos:prefLabel ?prefLabel .
  } 
  LIMIT 10
  " 
Executing Query:

PREFIX nxp:   <http://purl.org/nxp/schema/v1/>
PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
SELECT ?typeNumber ?prefLabel
WHERE
{
  ?x nxp:typeNumber ?typeNumber;
     skos:prefLabel ?prefLabel .
}
LIMIT 10

+------------------+---------------------------------------------------------------+
|    typeNumber    |                           prefLabel                           |
+------------------+---------------------------------------------------------------+
| "74AUP1G57GW"    | "Low-power configurable multiple function gate"@en-us         |
| "74AUP1G58GW"    | "Low-power configurable multiple function gate"@en-us         |
| "74AUP1T45GW"    | "Low-power dual supply translating transeiver; 3-state"@en-us |
| "74AUP2G32GN"    | "Dual2-inputORgate (IMPULSE)"@en-us                           |
| "74AUP2G32GS"    | "Dual2-inputORgate (IMPULSE)"@en-us                           |
| "74AVC16T245DGG" | "74AVC16T245DGG (IMPULSE)"@en-us                              |
| "74AVC16T245DGV" | "16-Bit Dual-SupplyTx/Rx w/3-State (IMPULSE)"@en-us           |
| "74AVC2T45DC"    | "74AVC2T45DC (IMPULSE)"@en-us                                 |
| "74AVC2T45DP"    | "2-bit Dual-supply translator (IMPULSE)"@en-us                |
| "74AVC2T45GD"    | "2-bit Dual-supply translator (IMPULSE)"@en-us                |
+------------------+---------------------------------------------------------------+

Query returned 10 results in 00:00:00.124

Monday, August 20, 2012

Some extra useful String functions (XQuery)

module namespace string = "http://www.nxp.com/string";

(:
  string:capitalize("test")  -->  "Test" 
:)
declare function string:capitalize($string as xs:string) as xs:string {
    let $tokens := string:split($string)
    return concat(upper-case($tokens[1]), string-join(subsequence($tokens, 2), ''))
};

(:
    string:capitalizeAll("makes you wonder")  --> "Makes You Wonder"
:)
declare function string:capitalizeAll($string as xs:string) as xs:string {
   string-join(for $word in string:splitWords($string) return string:capitalize($word), ' ') 
};

(:
   string:split("work") --> ("w", "o", "r", "k")
:)
declare function string:split($string as xs:string) as xs:string* {
    for $codepoint in string-to-codepoints($string) return codepoints-to-string($codepoint)
};

(:
   string:splitWords("go live")  --> ("go", "live")
:)
declare function string:splitWords($string as xs:string) as xs:string* {
    tokenize($string, "\s+")
};

(: 
  string:toCamelCase("business segment") --> "businessSegment" 
:)
declare function string:toCamelCase($string as xs:string) as xs:string {
    let $words := string:splitWords($string)
    return string-join(($words[1], for $word in subsequence($words, 2) return string:capitalize($word)), '')
};

Wednesday, July 4, 2012

Generating XQueries from XML based definitions using XSLT2.0

I'm currently working on a project where the customer wants to be able to specify what properties on a per category level of products have to be shown in a parametric table. The parametric header definition contains things like column names, tooltips, ordering, databinding and so on. This blog shows the current approach I took to dynamically generate an XQuery to generate preview table from that header definition. It's definitely not the final result but that is also not in my scope and will be handled by another team of developers. Just as a side note... All this will be automated using Apache Cocoon.

Parametric Header definition (XML input)

Header transformer (XSLT)

Generated XQuery

Preview table image

Final table image

Tuesday, July 3, 2012

Splitting XML file into multiple files using XSLT2.0

Suppose we have 1 folder where a manifest.xml is stored and some other files (basictypes.xml and packages.xml) which are referenced by the manifest file. These files contain multiple objects of a specific type and we want to split those in separate files. There are some hurdles to overcome:
  •  As some objects are logically duplicates (same identifier) which would be written to the same URI this would result in an exception.
SystemID: C:\pelssers\demo\manifest_transformer.xsl
Engine name: Saxon-HE 9.3.0.5
Severity: fatal
Description: Cannot write more than one result document to the same URI: file:/c:/pelssers/demo/export/basictypes/PH3330L.xml
Start location: 27:0
URL: http://www.w3.org/TR/xslt20/#err-XTDE1490
  • Second difficulty is that they are not identifiable with the same xpath-expression so to use 1 single group-by declaration for this heterogeneous bunch of elements needed a bit of thinking. I had to resort to a "Generic" function that would delegate to matching templates for the specific type of element. 

 manifest.xml
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
  <file href="basictypes.xml"/>
  <file href="packages.xml"/> 
</manifest>
 
basictypes.xml
<?xml version="1.0" encoding="UTF-8"?>
<basictypes>
    <basictype identifier="PH3330L">
        <description>N-channel TrenchMOS logic level FET</description>
        <magcode>R73</magcode> 
    </basictype>
    <basictype identifier="BUK3F00-50WDFE">
        <description>9675 AUTO IC (IMPULSE)</description>
        <magcode>R73</magcode>   
    </basictype>
    <basictype identifier="PH3330L">
        <description>this is a duplicate of PH3330L</description>
        <magcode>R73</magcode>         
    </basictype>
</basictypes>

packages.xml
<?xml version="1.0" encoding="UTF-8"?>
<packages>
    <package id="SOT669">
        <description>plastic single-ended surface-mounted package; 4 leads</description>
        <name>LFPAK; Power-SO8</name> 
    </package>
    <package id="SOT600-1">
        <description>plastic thin fine-pitch ball grid array package;</description>
        <name>TFBGA208</name>   
    </package>   
</packages>

In the XSLT below I first chose a grouping strategy to resolve the error of writing duplicate items to the same URI. Next I had to use a abstract function getURI for all element cases (basictype and package) which delegates the call to matching templates of @mode="getURI". I only use @mode="write" for the first element in each group and use @mode="skip" for all subsequent elements of that group. For this purpose I only log a messsage that i'm skipping them but I could also have implemented that handler differently like writing them to another folder. Only thing I would have to make sure of would be to include some unique identifiable part in the URI. I could e.g. use generate-id().
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  version="2.0">
 
  <xsl:param name="sourceFolder" select="xs:anyURI('file:///c:/pelssers/demo/')"/>
  <xsl:param name="destinationFolder" select="xs:anyURI('file:///c:/pelssers/demo/export/')"/>
    
  <xsl:function name="pelssers:getURI" as="xs:anyURI">
    <xsl:param name="element" as="element()"/> 
    <xsl:apply-templates select="$element" mode="getURI"/>  
  </xsl:function>  
    
  <xsl:template match="/">
   <xsl:variable name="elements" select="for $doc in (for $href in manifest/file/@href return document(xs:anyURI(concat($sourceFolder, $href)))    ) return $doc/*/*"/> 
   <xsl:for-each-group select="$elements" group-by="pelssers:getURI(.)">
     <xsl:apply-templates select="current-group()[1]" mode="write"/>
     <xsl:apply-templates select="subsequence(current-group(), 2)" mode="skip"/>
   </xsl:for-each-group> 
  </xsl:template>
  
  <xsl:template match="basictype | package" mode="write">
    <xsl:variable name="uri" select="pelssers:getURI(.)"/>
    <xsl:message>Processing <xsl:value-of select="local-name()"/> to URI <xsl:value-of select="$uri"/> </xsl:message>
    <xsl:result-document method="xml" href="{$uri}">
      <xsl:element name="{../local-name()}">
        <xsl:apply-templates select="../@*"/>
        <xsl:copy-of select="."/>
      </xsl:element>
    </xsl:result-document>    
  </xsl:template> 
  
  <xsl:template match="basictype | package" mode="skip">  
    <xsl:variable name="uri" select="pelssers:getURI(.)"/>
    <xsl:message>Warning !! Skipping duplicate <xsl:value-of select="local-name()"/> with URI <xsl:value-of select="$uri"/> </xsl:message>    
  </xsl:template>  
  
  <xsl:template match="basictype" as="xs:anyURI" mode="getURI">
    <xsl:sequence select="xs:anyURI(concat($destinationFolder, 'basictypes/', @identifier, '.xml'))"/>
  </xsl:template>
  
  <xsl:template match="package" as="xs:anyURI" mode="getURI">
    <xsl:sequence select="xs:anyURI(concat($destinationFolder, 'packages/', @id, '.xml'))"/>
  </xsl:template>

</xsl:stylesheet>

The output of running this transformation nicely reports what's happening.
[Saxon-HE] Processing basictype to URI file:///c:/pelssers/demo/export/basictypes/PH3330L.xml
[Saxon-HE] Warning !! Skipping duplicate basictype with URI file:///c:/pelssers/demo/export/basictypes/PH3330L.xml
[Saxon-HE] Processing basictype to URI file:///c:/pelssers/demo/export/basictypes/BUK3F00-50WDFE.xml
[Saxon-HE] Processing package to URI file:///c:/pelssers/demo/export/packages/SOT669.xml
[Saxon-HE] Processing package to URI file:///c:/pelssers/demo/export/packages/SOT600-1.xml

Friday, June 29, 2012

Experimenting with Jena SPARQL processor

Today I started playing with Jena ARQ, a SPARQL processor. First thing I needed to do was producing some RDF data from our (Sedna) XMLDB.
import module namespace basictypes = "http://www.nxp.com/basictypes";
import module namespace packages = "http://www.nxp.com/packages";

declare function local:toRDF() {
  <rdf:RDF 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"  
    xmlns:bt="http://www.nxp.com/bt"
    xmlns:pkg="http://www.nxp.com/pkg">
    {
       for $product in basictypes:getBasicTypes()[ProductInformation/MagCode = ('R73', 'R01', 'R02')]
       let $prodInfo := $product/ProductInformation
       let $btn := data($prodInfo/Name)
       let $pkgId := data($prodInfo/PackageID)
       return
       <bt:BasicType rdf:about="http://www.nxp.com/bt/{$btn}">
         <bt:name>{$btn}</bt:name>
         <bt:magcode>{data($prodInfo/MagCode)}</bt:magcode>
         <bt:piptype rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">
           {data($prodInfo/PIPType)}</bt:piptype>
         <bt:status>{data($prodInfo/Status)}</bt:status> 
         <bt:maturity>{data($prodInfo/Maturity)}</bt:maturity>
         <bt:package rdf:resource="http://www.nxp.com/pkg/{$pkgId}"/>
       </bt:BasicType>
   }
   {
      for $pkg in packages:getPackages()
      let $pkgInfo := $pkg/PackageInformation
      let $pkgn := data($pkgInfo/Name)
      return
      <pkg:Package rdf:about="http://www.nxp.com/pkg/{$pkgn}">
        <pkg:name>{$pkgn}</pkg:name>
        <pkg:status>{data($pkgInfo/Status)}</pkg:status>
        <pkg:maturity>{data($pkgInfo/Maturity)}</pkg:maturity>
      </pkg:Package>
   }
 </rdf:RDF>
};

local:toRDF()

Below a short extract from the generated RDF testdata
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
  xmlns:bt="http://www.nxp.com/bt" 
  xmlns:pkg="http://www.nxp.com/pkg">
  <bt:BasicType rdf:about="http://www.nxp.com/bt/BGF802-20">
    <bt:name>BGF802-20</bt:name>
    <bt:magcode>R02</bt:magcode>
    <bt:piptype 
      rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">0</bt:piptype>
    <bt:status>OBS</bt:status>
    <bt:maturity>Product</bt:maturity>
    <bt:package rdf:resource="http://www.nxp.com/pkg/SOT365C"/>
  </bt:BasicType>
  <bt:BasicType rdf:about="http://www.nxp.com/bt/BLC6G10LS-160RN">
    <bt:name>BLC6G10LS-160RN</bt:name>
    <bt:magcode>R02</bt:magcode>
    <bt:piptype rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</bt:piptype>
    <bt:status>ACT</bt:status>
    <bt:maturity/>
    <bt:package rdf:resource="http://www.nxp.com/pkg/SOT896B"/>
  </bt:BasicType>
  <pkg:Package rdf:about="http://www.nxp.com/pkg/SOT365C">
    <pkg:name>SOT365C</pkg:name>
    <pkg:status>DEV</pkg:status>
    <pkg:maturity>Product</pkg:maturity>
  </pkg:Package>
  <pkg:Package rdf:about="http://www.nxp.com/pkg/SOT896B">
    <pkg:name>SOT896B</pkg:name>
    <pkg:status>DEV</pkg:status>
    <pkg:maturity>Product</pkg:maturity>
  </pkg:Package>
</rdf:RDF>  

Next I saved the file to disk in order to write a unit test querying this data.
@Test
public void executeQuery() throws IOException {
    InputStream in = new FileInputStream(new File("c:/tmp/rdfdata.xml"));
    Model model = ModelFactory.createMemModelMaker().createModel("TestData");
    model.read(in, null);
    in.close();
    String sQuery =
            "PREFIX bt: <http://www.nxp.com/bt>\n" +
            "SELECT ?s \n" +
            "WHERE\n" +
            "{\n" +
            "?s bt:package <http://www.nxp.com/pkg/SOT365C>" +
            "}";

    Query query = QueryFactory.create(sQuery);
    QueryExecution qexec = QueryExecutionFactory.create(query, model);
    ResultSet results = qexec.execSelect();
    ResultSetFormatter.out(System.out, results, query);
    qexec.close();
}

The unit test runs a query listing all basictypes that have a package SOT365C.
-------------------------------------------
| s                                       |
===========================================
| <http://www.nxp.com/bt/BGF802-20> |
-------------------------------------------

Friday, June 15, 2012

XSLT puzzler: removing preceding deep equal elements

Goal: remove duplicate (deep-equal) items from the tree. I made a small modification. Instead of using xsl:value-of I switched to xsl:sequence inside the functions. I got spanked on the butt by Andrew Welch. But I really like the XSLT mailinglist. It's one of the most responsive communities i've seen to be honest. As long as you isolate your problem and formulate the desired solution clearly they come up with working solutions within a day.
A tip here - when the sequence type (as attribute) of the function is
an atomic, always use xsl:sequence instead of xsl:value-of.

When you use value-of you create a text node, which then gets atomized
to the sequence type, so you can avoid that unnecessary step by using
xsl:sequence which will return an atomic.

The general rule is 'always use xsl:sequence in xsl:functions', as you
pretty much always return atomics from functions.

(that's also a good interview question "what's the difference between
xsl:sequence and xsl:value-of)


--
Andrew Welch
http://andrewjwelch.com


Input XML
<?xml version="1.0" encoding="UTF-8"?>
<myroot>
    <RNC>
        <nodeA id="a">
            <section id="1">
                <item1 id="0" method="delete"/>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="2">
                <!-- second consecutive delete we remove this -->
                <item1 id="0" method="delete"/>
                <!-- third consecutive delete BUT children have different value , so we don't remove this -->
                <item1 id="0" method="delete">
                    <somechild>bbb</somechild>
                </item1>
                <item1 id="3" method="create">
                    <other>xx</other>
                </item1>
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
                <!-- second consecutive create, we remove this -->
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="3">
                <!-- third consecutive create, we remove this -->
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
        </nodeA>
    </RNC>
</myroot>

Desired output XML
<?xml version="1.0" encoding="UTF-8"?>
<myroot>
    <RNC>
        <nodeA id="a">
            <section id="1">
                <item1 id="0" method="delete"/>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="2">
                <!-- third consecutive delete BUT children have different value , so we don't remove this -->
                <item1 id="0" method="delete">
                    <somechild>bbb</somechild>
                </item1>
                <item1 id="3" method="create">
                    <other>xx</other>
                </item1>
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="3"/>
        </nodeA>
    </RNC>
</myroot>

My solution using functions:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:custom="www.company.com">
    
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:function name="custom:equalPrecedingItemCount" as="xs:integer">
      <xsl:param name="preceding_items"/>
      <xsl:param name="this_item"/> 
      <xsl:sequence select="sum(for $item in $preceding_items return custom:getEqualityValue($item, $this_item))"/>
    </xsl:function>
    
    <xsl:function name="custom:getEqualityValue" as="xs:integer">
      <xsl:param name="item1"/>
      <xsl:param name="item2"/> 
      <xsl:sequence select="if (deep-equal($item1, $item2)) then 1 else 0"/>
    </xsl:function>

    <xsl:template match="node()|@*">
      <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
    </xsl:template>

    <!-- we skip the item if it has preceding items which are equal -->
    <xsl:template match="item1[custom:equalPrecedingItemCount(preceding::item1, .) > 0]"/>

</xsl:stylesheet>