If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Wednesday, October 31, 2012

Using keys with XSLT2.0

This article will show you how to efficiently use keys to speed up XSLT processing if you are dealing with large input files (hundreds of megabytes). Consider following example where we have two input files (stock.xml and orderlines.xml). The idea is to update the stock with new quantities by processing the orderlines.

The challenge here is how to use a key (built from matching orderlines) in the context of processing the stock. It might sound trivial but I leave it up to yourself to find out it's actually not.

stock.xml
<?xml version="1.0" encoding="UTF-8" ?>
<stock>
  <item id="PH3330L">
    <quantity>10</quantity>
  </item>
  <item id="BAS16">
    <quantity>7</quantity>
  </item>
  <item id="BUK100-50DL">
    <quantity>14</quantity>
  </item>  
</stock>

orderlines.xml
<?xml version="1.0" encoding="UTF-8" ?>
<orderlines>
  <orderline itemId="PH3330L">
    <quantity>4</quantity>
  </orderline>
  <orderline itemId="BAS16">
    <quantity>2</quantity>
  </orderline> 
</orderlines>

newstock.xml (expected output)
<?xml version="1.0" encoding="UTF-8"?>
<stock>
  <item id="PH3330L">
    <quantity>6</quantity>
  </item>
  <item id="BAS16">
    <quantity>5</quantity>
  </item>
  <item id="BUK100-50DL">
    <quantity>14</quantity>
  </item>  
</stock>

processOrderlines.xslt
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
-->

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  exclude-result-prefixes="pelssers xs">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  
  <xsl:param name="orderlinesURI" />
  <xsl:variable name="orderlines" select="document($orderlinesURI)/orderlines"/>
  <xsl:key name="orderline-lookup" match="orderline" use="@itemId"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:function name="pelssers:newQuantity" as="xs:double">
    <xsl:param name="element" as="element(orderlines)"/>
    <xsl:param name="itemId" as="xs:string"/>
    <xsl:param name="stockQuantity" as="xs:double"/>
    <xsl:apply-templates select="$element">
      <xsl:with-param name="itemId" select="$itemId"/>
      <xsl:with-param name="stockQuantity" select="$stockQuantity"/>
    </xsl:apply-templates>
  </xsl:function>

  <xsl:template match="orderlines" as="xs:double">
    <xsl:param name="itemId" as="xs:string"/>
    <xsl:param name="stockQuantity" as="xs:double"/>    
    <xsl:sequence select="if (exists(key('orderline-lookup', $itemId))) 
                  then $stockQuantity - key('orderline-lookup', $itemId)/quantity else $stockQuantity"/>
  </xsl:template>

  <xsl:template match="stock/item/quantity">
    <quantity><xsl:sequence select="pelssers:newQuantity($orderlines, parent::item/@id, .)"/></quantity>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

For this demo I only used the saxon jar from the command line.
java -Xmx1024m -jar Saxon-HE-9.4.jar 
  -s:C:/tmp/keydemo/input/stock.xml 
  -o:C:/tmp/keydemo/output/newstock.xml 
  -xsl:C:/tmp/keydemo/xslt/processOrderlines.xslt orderlinesURI=file:/C:/tmp/keydemo/input/orderlines.xml

Below a simplified stylesheet using a 3rd parameter to set the context node. It's based on a tip from @grtjn.
<?xml version="1.0" encoding="UTF-8"?>
<!--
Author: Robby Pelssers
-->

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:pelssers="http://robbypelssers.blogspot.com"
  exclude-result-prefixes="pelssers xs">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  
  <xsl:param name="orderlinesURI" />
  <xsl:variable name="orderlines" select="document($orderlinesURI)/orderlines"/>
  <xsl:key name="orderline-lookup" match="orderline" use="@itemId"/>

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="stock/item/quantity">
    <xsl:variable name="orderline" select="key('orderline-lookup', parent::item/@id, $orderlines)"/>
    <quantity><xsl:sequence select="if (exists($orderline)) then . - $orderline/quantity else xs:double(.)"/></quantity>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Friday, October 26, 2012

Indenting your XSLT output

Ever wondered why the output from your XSLT is not indented even if you use @indent="yes"?
You will need to use extension functions and below are examples how to do it for Xalan and Saxon. Remark: For this to work with Saxon you will need the professional edition.

Saxon:
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:saxon="http://saxon.sf.net/"
                extension-element-prefixes="saxon">

    <xsl:output method="xml" saxon:indent-spaces="4" indent="yes"/>


Xalan:
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xslt="http://xml.apache.org/xslt">

    <xsl:output method="xml" xslt:indent-amount="4" indent="yes"/>


Friday, October 19, 2012

Creating UNIX timestamp with XSLT2.0 (Saxon)

Creating timestamps is a quite often used requirement. If you start googling for how to create one in XSLT, you find exotic solutions. Today I set out to find an elegant one using XSLT extension functions.
If you take a look at the Java API, and in particular java.util.Date, you will see a method getTime() which returns exactly what I need.
long getTime()
Returns the number of milliseconds since January 1, 1970, 00:00:00 GMT represented by this Date object.

Now let's see at a simple input XML containing products. For each product we want to generate a timestamp while processing each product node.
<products>
  <product>
    This is a complex node
  </product>
  <product>
    This is a complex node
  </product>  
</products>

To understand how extension functions with Saxon can be used, take a look here. In this case we really need to construct new Date objects and invoke the method getTime on them. We bind the prefix date to the namespace java:java.util.Date. Next we can construct a new date object with date:new(). To invoke a method on any object you actually have to pass the context object to that method. So date:getTime(date:new()) is actually the java equivalent for new java.util.Date().getTime()
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:nxp="http://www.nxp.com">
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

  <xsl:function name="nxp:getTimestamp">
    <xsl:value-of select="date:getTime(date:new())"  xmlns:date="java:java.util.Date"/>
  </xsl:function>

  <xsl:template match="product">
   <product processedTimestamp="{nxp:getTimestamp()}">
     <xsl:apply-templates/>
   </product>
  </xsl:template>

</xsl:stylesheet>

So when you execute that stylesheet you will end up with product tags having a new attribute like below:
<product processedTimestamp="1350635976117">
 ...
</product>

Wednesday, October 17, 2012

Cocoon flowscript gotcha

Today I got a pretty hard to debug issue on my plate. To give more insight into the complexity let me explain the chain of events happening:
  • client side javascript doing a POST to a cocoon pipeline
  • flowscript calling another pipeline to fetch JSON from XMLDB
  • send a response to the zipserializer 
So there is a lot going on in those 3 simple events.  The easy part is we post some values from a form to a server side action.

POST parameters:
id=PH3030AL

But the id PH303AL is the identifier of a chemical content XML file. What we really need is the identifiers of the 1-to-many relationship with salesitems. We can retrieve those by executing an xquery.  We wrote a custom XQuery Generator that takes any request parameters and injects them dynamically into the XQuery as parameters.

So let's take a look at an action mapped to an XQuery:

<map:match pattern="xquery/getSalesItems">
  <map:generate src="xquery/chemicalcontent/getSalesItems.xquery" type="queryStringXquery"/>
  <map:serialize type="xml"/>
</map:match>

The only thing needed is mapping some match pattern to the xquery and the XQuery generator will execute the xquery after injecting any requests parameters provided.

So far so good.  So if we now invoke following URL:
http://localhost:8888/search/xquery/getSalesItems?id=PH3030AL
we get back following JSON response

[{"id": "PH3030AL","nc12s": ["934063085115"]}]

But from flowscript we needed to call this pipeline and this is were the problem occurred.

var output = new Packages.java.io.ByteArrayOutputStream();
var requestString = new Collection(cocoon.request.getParameterValues("id")).stringJoin(function(id) {return "id=" + id}, "&");
var uri = "xquery/getSalesItems?" + requestString;
cocoon.processPipelineTo(uri, null, output);

When we printed the output to the console we got following JSON output:
[{"id": "PH3030AL","nc12s": ["934063085115"]},{"id": "PH3030AL","nc12s": ["934063085115"]}]
To keep a long story short. We should not add the request string another time for invoking another pipeline in the same request as the POST parameters were already present and we ended up with duplicate request parameters. Changing the URI to the one below fixed our bug.
var uri = "xquery/getSalesItems";
 

Tuesday, October 16, 2012

Using SVG in modern browsers

SVG has come a long way and I can still clearly remember the days one had to install Adobe flash to enable SVG in the browser. I was pretty excited about this technology as it enabled me to dynamically generate images from the domain model. Currently you can even use the html5 canvas tag. But back to SVG. We switched from using .eps files not so long ago to SVG. You can't render EPS in the browser so that's one big drawback. And the good news is recent browsers (IE9, chrome, firefox, ..) natively support SVG. Forget about using the old school <object> and <data> tags to embed SVG ;-)
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />     
    <title>SVG test </title>
  </head>
  <body>
    <div>
      <div>embedding SVG with img tag</div>
      <div><img src="test.svg"/></div>
    <div>
    <div>
      <div>embedding inline SVG</div>
      <div>
        <svg  xmlns="http://www.w3.org/2000/svg" xml:space="preserve" width="40mm" height="28.63mm" viewBox="0 0 90.00 81.14">
          <rect x="1" y="1" width="60" height="60" fill="none" stroke="red" stroke-width="2"/>
          <circle cx="40" cy="44" r="30" fill="none" stroke="yellow" stroke-width="10"  />
          <text x="24" y="48">SVG</text>
        </svg>
      </div> 
  </body>    
</html>

test.svg
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" width="40mm" height="28.63mm" viewBox="0 0 90.00 81.14">
  <rect x="1" y="1" width="60" height="60" fill="none" stroke="purple" stroke-width="2"/>
  <circle cx="40" cy="44" r="30" fill="red" stroke="blue" stroke-width="10"  />
  <text x="24" y="48">SVG</text>
</svg>

The result looks like presented below

Friday, October 5, 2012

Using Play JSON library to expose JSON services

Currently I am experimenting with Playframework. Today we will see how easy it is to generate a JSON response. Let us first take a look at our domain model
package models

case class Contact(firstName: String, lastName: String, age: Int)

object Contact {
  val contacts = Set(
      Contact("Robby", "Pelssers", 35),
      Contact("Davy", "Pelssers", 35),
      Contact("Lindsey", "Pelssers", 9)
  )
  
  def findAll = this.contacts.toList.sortBy(_.firstName)

}
Next let us take a look at a simple controller returning contacts in JSON format
package controllers

import play.api.mvc.{ Action, Controller }
import play.api.libs.json.Json._
import models.Contact

object Contacts extends Controller {

  def toJSON = Action { implicit request =>
    val contacts = Contact.findAll   
    val json = 
      toJson(
        contacts.map(
            contact => toJson(
                Map("firstName" -> toJson(contact.firstName), 
                    "lastName" -> toJson(contact.lastName), 
                    "age" -> toJson(contact.age))
            )    
        )
      )
     
    Ok(json)
  }
  
}

Now it's a matter of mapping a URL to our controller method.
GET     /contacts.json              controllers.Contacts.toJSON
Now let's see if our response looks ok by using curl
$ curl --request GET --include http://localhost:9000/contacts.json
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Length: 160

[{"firstName":"Davy","lastName":"Pelssers","age":35},{"firstName":"Lindsey","lastName":"Pelssers","age":9},{"firstName":"Robby","lastName":"Pelssers","age":35}]

Wednesday, October 3, 2012

java.io.FileNotFoundException: Too many open files

This morning we noticed that our (S)FTP service was not sending files anymore. We monitor this process however and any error logmessages are stored per job. A few error logmessages were very deceiving however
File[] files = jobIdDirectory.listFiles();
if (null == files || files.length != 1) {
  throw new SharedFileValidationException(String.format(
  "The directory %s is expected to contain a single file inside, actual number of files is %d", jobIdDirectory.getAbsolutePath(), files != null ? files.length : -1));
}

Checking the logs for a job that failed:
2012-10-03 08:40:26,498 INFO  pool-35 com.nxp.spider2.ftpservice.service.impl.FileTransferPostProcessor - <<< Updating job 74058544 status to Sent Failed [item: BUK7540-100A, workflow: SPC-2-PWS, error message: The directory /appl/wpc/5.2/pxprod1/public_html/sharedFS/74058544 is expected to contain a single file inside, actual number of files is -1]
Checking the actual filesystem shows however that this is not the case
$/appl/../public_html/sharedFS/74058544/TRIMM_PRODUCTS>ls
BUK7540-100A.zip

But luckily we found another logmessage which gave us more insight
Caused by: java.io.FileNotFoundException: /appl/../public_html/sharedFS/73526306/TRIMM_PRODUCTS/LPC1830FET256.zip (Too many open files)

So my next action was to get more info on how to resolve this kind of error and i found this useful StackOverflow question. It turned out that I could not use the lsof command.
lsof -p <pid of jvm>

But I managed to find the PID for the failing tomcat instance.
ps -ef | grep tomcat
So next I took a look at the following folder:
/proc/{pid}/fd>ls | wc -l
   1024
So it seems we hit the default sweet spot of open files. I shutdown the tomcat instance and the open file handles got closed. Actually the complete /proc/{pid} folder got deleted.
Next I restarted the tomcat instance and checked against the new pid. The number of open file handles grows and shrinks so it does not seem to be a code issue.
/proc/11395/fd>ls | wc -l
    147
/proc/11395/fd>ls | wc -l
    142
/proc/11395/fd>ls | wc -l
    142
:/proc/11395/fd>ls | wc -l
    142
/proc/11395/fd>ls | wc -l
    143
I actually just checked and we have about 9000 jobs to be processed and this big backlog could potentially result in this issue.