If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Friday, June 29, 2012

Experimenting with Jena SPARQL processor

Today I started playing with Jena ARQ, a SPARQL processor. First thing I needed to do was producing some RDF data from our (Sedna) XMLDB.
import module namespace basictypes = "http://www.nxp.com/basictypes";
import module namespace packages = "http://www.nxp.com/packages";

declare function local:toRDF() {
  <rdf:RDF 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"  
    xmlns:bt="http://www.nxp.com/bt"
    xmlns:pkg="http://www.nxp.com/pkg">
    {
       for $product in basictypes:getBasicTypes()[ProductInformation/MagCode = ('R73', 'R01', 'R02')]
       let $prodInfo := $product/ProductInformation
       let $btn := data($prodInfo/Name)
       let $pkgId := data($prodInfo/PackageID)
       return
       <bt:BasicType rdf:about="http://www.nxp.com/bt/{$btn}">
         <bt:name>{$btn}</bt:name>
         <bt:magcode>{data($prodInfo/MagCode)}</bt:magcode>
         <bt:piptype rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">
           {data($prodInfo/PIPType)}</bt:piptype>
         <bt:status>{data($prodInfo/Status)}</bt:status> 
         <bt:maturity>{data($prodInfo/Maturity)}</bt:maturity>
         <bt:package rdf:resource="http://www.nxp.com/pkg/{$pkgId}"/>
       </bt:BasicType>
   }
   {
      for $pkg in packages:getPackages()
      let $pkgInfo := $pkg/PackageInformation
      let $pkgn := data($pkgInfo/Name)
      return
      <pkg:Package rdf:about="http://www.nxp.com/pkg/{$pkgn}">
        <pkg:name>{$pkgn}</pkg:name>
        <pkg:status>{data($pkgInfo/Status)}</pkg:status>
        <pkg:maturity>{data($pkgInfo/Maturity)}</pkg:maturity>
      </pkg:Package>
   }
 </rdf:RDF>
};

local:toRDF()

Below a short extract from the generated RDF testdata
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
  xmlns:bt="http://www.nxp.com/bt" 
  xmlns:pkg="http://www.nxp.com/pkg">
  <bt:BasicType rdf:about="http://www.nxp.com/bt/BGF802-20">
    <bt:name>BGF802-20</bt:name>
    <bt:magcode>R02</bt:magcode>
    <bt:piptype 
      rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">0</bt:piptype>
    <bt:status>OBS</bt:status>
    <bt:maturity>Product</bt:maturity>
    <bt:package rdf:resource="http://www.nxp.com/pkg/SOT365C"/>
  </bt:BasicType>
  <bt:BasicType rdf:about="http://www.nxp.com/bt/BLC6G10LS-160RN">
    <bt:name>BLC6G10LS-160RN</bt:name>
    <bt:magcode>R02</bt:magcode>
    <bt:piptype rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</bt:piptype>
    <bt:status>ACT</bt:status>
    <bt:maturity/>
    <bt:package rdf:resource="http://www.nxp.com/pkg/SOT896B"/>
  </bt:BasicType>
  <pkg:Package rdf:about="http://www.nxp.com/pkg/SOT365C">
    <pkg:name>SOT365C</pkg:name>
    <pkg:status>DEV</pkg:status>
    <pkg:maturity>Product</pkg:maturity>
  </pkg:Package>
  <pkg:Package rdf:about="http://www.nxp.com/pkg/SOT896B">
    <pkg:name>SOT896B</pkg:name>
    <pkg:status>DEV</pkg:status>
    <pkg:maturity>Product</pkg:maturity>
  </pkg:Package>
</rdf:RDF>  

Next I saved the file to disk in order to write a unit test querying this data.
@Test
public void executeQuery() throws IOException {
    InputStream in = new FileInputStream(new File("c:/tmp/rdfdata.xml"));
    Model model = ModelFactory.createMemModelMaker().createModel("TestData");
    model.read(in, null);
    in.close();
    String sQuery =
            "PREFIX bt: <http://www.nxp.com/bt>\n" +
            "SELECT ?s \n" +
            "WHERE\n" +
            "{\n" +
            "?s bt:package <http://www.nxp.com/pkg/SOT365C>" +
            "}";

    Query query = QueryFactory.create(sQuery);
    QueryExecution qexec = QueryExecutionFactory.create(query, model);
    ResultSet results = qexec.execSelect();
    ResultSetFormatter.out(System.out, results, query);
    qexec.close();
}

The unit test runs a query listing all basictypes that have a package SOT365C.
-------------------------------------------
| s                                       |
===========================================
| <http://www.nxp.com/bt/BGF802-20> |
-------------------------------------------

Friday, June 15, 2012

XSLT puzzler: removing preceding deep equal elements

Goal: remove duplicate (deep-equal) items from the tree. I made a small modification. Instead of using xsl:value-of I switched to xsl:sequence inside the functions. I got spanked on the butt by Andrew Welch. But I really like the XSLT mailinglist. It's one of the most responsive communities i've seen to be honest. As long as you isolate your problem and formulate the desired solution clearly they come up with working solutions within a day.
A tip here - when the sequence type (as attribute) of the function is
an atomic, always use xsl:sequence instead of xsl:value-of.

When you use value-of you create a text node, which then gets atomized
to the sequence type, so you can avoid that unnecessary step by using
xsl:sequence which will return an atomic.

The general rule is 'always use xsl:sequence in xsl:functions', as you
pretty much always return atomics from functions.

(that's also a good interview question "what's the difference between
xsl:sequence and xsl:value-of)


--
Andrew Welch
http://andrewjwelch.com


Input XML
<?xml version="1.0" encoding="UTF-8"?>
<myroot>
    <RNC>
        <nodeA id="a">
            <section id="1">
                <item1 id="0" method="delete"/>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="2">
                <!-- second consecutive delete we remove this -->
                <item1 id="0" method="delete"/>
                <!-- third consecutive delete BUT children have different value , so we don't remove this -->
                <item1 id="0" method="delete">
                    <somechild>bbb</somechild>
                </item1>
                <item1 id="3" method="create">
                    <other>xx</other>
                </item1>
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
                <!-- second consecutive create, we remove this -->
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="3">
                <!-- third consecutive create, we remove this -->
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
        </nodeA>
    </RNC>
</myroot>

Desired output XML
<?xml version="1.0" encoding="UTF-8"?>
<myroot>
    <RNC>
        <nodeA id="a">
            <section id="1">
                <item1 id="0" method="delete"/>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="2">
                <!-- third consecutive delete BUT children have different value , so we don't remove this -->
                <item1 id="0" method="delete">
                    <somechild>bbb</somechild>
                </item1>
                <item1 id="3" method="create">
                    <other>xx</other>
                </item1>
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="3"/>
        </nodeA>
    </RNC>
</myroot>

My solution using functions:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:custom="www.company.com">
    
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:function name="custom:equalPrecedingItemCount" as="xs:integer">
      <xsl:param name="preceding_items"/>
      <xsl:param name="this_item"/> 
      <xsl:sequence select="sum(for $item in $preceding_items return custom:getEqualityValue($item, $this_item))"/>
    </xsl:function>
    
    <xsl:function name="custom:getEqualityValue" as="xs:integer">
      <xsl:param name="item1"/>
      <xsl:param name="item2"/> 
      <xsl:sequence select="if (deep-equal($item1, $item2)) then 1 else 0"/>
    </xsl:function>

    <xsl:template match="node()|@*">
      <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
    </xsl:template>

    <!-- we skip the item if it has preceding items which are equal -->
    <xsl:template match="item1[custom:equalPrecedingItemCount(preceding::item1, .) > 0]"/>

</xsl:stylesheet>

Thursday, June 14, 2012

Regular xquery versus xquery using index [Sedna XMLDB]

This article will show you 2 different ways of querying for data. Mainly accessing the collection directly versus using an index to fetch the data.
declare function local:getBasicTypes($productIds as xs:string*) as element(Product)* {
    (: remark: @identifier = $productIds acts like a SQL @identifier in $productIds :)
    collection("basicTypes/released")/Product[@identifier = $productIds]
};

let $ids := ('PH3330L', 'PH3030CL')
return 

  {
    for $bt in local:getBasicTypes($ids)
    return $bt/ProductInformation/Description
  }


Now let us create an index for the @identifier and retrieve the data using this index
create index "basictype_id"
  on fn:collection("basicTypes/released")/Product
  by @identifier
  as xs:string

declare function local:getBasicTypesByIndex($productIds as xs:string*) as element(Product)* {
    for $id in $productIds return index-scan('basictype_id', $id, 'EQ')
};

let $ids := ('PH3330L', 'PH3030CL')
return 

  {
    for $bt in local:getBasicTypesByIndex($ids)
    return $bt/ProductInformation/Description
  }


The result is the same for both:
<result>
  <Description>N-channel TrenchMOS logic level FET</Description>
  <Description>9657 Trench 7 (IMPULSE)</Description>
</result>

Here is an example of creating an index using namespaces:
declare namespace cat = "urn:iso:std:iso:ts:29002:-10:ed-1:tech:xml-schema:catalogue";
create index "legacy_id"
  on fn:collection("legacyBasicTypes")/cat:catalogue
  by cat:item/cat:reference/@reference_number
  as xs:string

Wednesday, June 13, 2012

XSLT not powerful, really??


<?xml version="1.0"?>
<sparql>
  <head>
    <variable name="subnode"/>
  </head>
  <results>
    <result>
      <binding name="subnode">
        <uri>http://data.kasabi.com/dataset/nxp-products/basicType/LPC1114FA44</uri>
      </binding>
    </result>
    <result>
      <binding name="subnode">
        <uri>http://data.kasabi.com/dataset/nxp-products/productTree/1498</uri>
      </binding>
    </result>
  </results>
</sparql>

We want to extract all identifiers for each URI on a newline saved as text. So expected output for this sample is
LPC1114FA44
1498

There are multiple ways to solve this but the solution below is a pretty easy one. It string-joins the last part of the tokenized uri with a newline character.
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

  <xsl:output  method="text" encoding="UTF-8" media-type="text/plain"/>
  
  <xsl:template match="/">
    <xsl:variable name="uris" select="/sparql/results/result/binding/uri"/>
    <xsl:value-of select="string-join(for $uri in $uris return tokenize($uri, '/')[last()], '&#xa;')"/>
  </xsl:template>
  
</xsl:stylesheet>

Thursday, June 7, 2012

Managing XML Schemas and Modules with Sedna

At my customers site we use the Sedna XMLDB. In the beginning we stored all XML schemas used to validate on insertion in a custom XMLDB wrapper on top of the Sedna driver for Java. That appeared to be a bit tedious as we regularly have schema changes and we had to redeploy all projects using the XMLDB wrapper. We also stored some schemas redundantly in a Cocoon application which generates some data on the fly using transformations. We solved this problem by also storing the XML schemas in the XMLDB itself. I also installed 2 admin clients but the one I use most is SDBAdmin for windows which can be downloaded from here. It's pretty easy to connect to your database on localhost (default port is 5050). You can also use putty and setup a tunnel to your QA or PROD server.







Loading or replacing a module can be done by using following command in Query panel:
LOAD OR REPLACE MODULE "C:\workspaces\nxp\xmldbClient\src\main\resources\xquery\basictype.xqlib"

Replacing an XML schema would take two steps. If you are uploading a schema for the first time only the second command needs to be executed
DROP DOCUMENT "ChemicalContent.xsd" IN COLLECTION "xmlSchemas"

LOAD "C:/development/workspaces/intellij11/CTPI-PX/spider2/schemas/ChemicalContent.xsd" "ChemicalContent.xsd" "xmlSchemas"

You can actually also execute commands using the terminal:
pxqa1@nlscli72:/appl/spider_qa/sedna/pxqa1/sedna/bin>./se_term nxp
Welcome to term, the SEDNA Interactive Terminal. Type \? for help.
nxp> LOAD "/home/pxqa1/deployment/iso29002-10xml_V099.xml" "legacy.xml" "legacy" &
Bulk load succeeded
nxp>