If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Friday, November 18, 2011

Apache Cocoon: From XML to JSON as datasource for client or server side Javascript

In a Cocoon project I am working currently working on we are using Sedna XMLDB. All the information we use is stored as XML and I just wanted to share some insight in several use cases we
are handling. Just as a side note, we are using a custom XQueryGenerator to fetch all data but the use cases below also match for a file repository.

Use case 1: publish content from a single XML source (HTML/ PDF/ ...)
As we only need 1 single source file it is often easier to use XSLT to generate the final HTML as XSLT offers more advanced features then XQuery as we speak. To name a few
examples, XQuery lacks good out-of-the-box support for:
- grouping
- formatting

Also you have to use another mindset when programming with XSLT. In XSLT you basically declaratively tell WHAT to do when the XSLT processor matches a specific
node or attribute. So you don't have to take into account all possible combinations that might occur. Whereas in XQuery you have to state explicitly what you want to produce.

Use case 2: publish content from multiple XML sources (HTML/ PDF/ ...)
While Cocoon does allow you to get the job done using e.g. the cinclude transformer or the aggrate generator it becomes rather messy very quickly.

And there are many ways to accomplish the same result. You might choose to write several pipelines, each one generating and extracting (xslt) the data needed and aggregating those results.
Or you could just aggregate all sources in 1 go and write a single xslt to extract the data.

For this use case however it is much more convenient to just write 1 single XQuery which is able to generate whatever your needs are.

Use case 3: perform some conditional logic based upon the source XML before processing it

Let's say the XML source either describes a Male or Female and you can find out by fetching the content of Gender tag.

Again there are multiple ways to Rome and to just name a few. Write a Java Component which extracts the data you need and include it in your flowscript. But how easy is this generalizable? what methods should the interface contain? Different use cases might call for different interfaces and you don't want to end up creating new java classes all the time.

Use case 4: you want to use some fancy Javascript widget which needs a JSON datasource
If all your data would be coming straight from a relational database, your entities could be easily mapped to JSON. There are plenty of libraries out there. But how do we generate JSON from XML in particular?

To be able to solve use cases 3 and 4 I decided to come up with a XML dialect which could represent JSON and also do some formatting if needed.

Below a sample representation of the JSON-XML dialect.

Generic transformer (XSLT2.0) which generates a JSON string from the input

A sitemap example generating JSON from XML

Now we can invoke the uri 'data2json/employees/1234' and the corresponding JSON will be generated which can be used on the server and client side.

An example of how we can use this from flowscript:


  1. Very nicely described.

    Just one suggestion from my side. Using your approach we still need to generate that JSON-XML dialect first. Although you didn't mention it, it can be obviously created by XQuery from initial xml source. Then why don't we generate JSON from XQuery and avoid xslt processing step completely? It seems to be possible according to this tutorial:

    Well, yes, it uses exist:serialize option there but Sedna XML DB provides a similar one:

    Although I didn't try it on my own but, it looks even more straightforward.

  2. Hi Ivan,

    a actually tried to do exactly as you proposed using Sedna. But at the time i could only serialize to XML. Maybe this has changed meanwhile. But as I stated another reason is formatting values. XSLT provides functions for formatting numbers and dates whereas XQuery does not. Also I had a use case where the value was CDATA so I still had to do some postprocessing with XSLT to be able to use XSLT's feature 'Disable-ouput-escaping'.