Suppose we have 1 folder where a manifest.xml is stored and some other files (basictypes.xml and packages.xml) which are referenced by the manifest file. These files contain multiple objects of a specific type and we want to split those in separate files.
There are some hurdles to overcome:
- As some objects are logically duplicates (same identifier) which would be written to the same URI this would result in an exception.
SystemID: C:\pelssers\demo\manifest_transformer.xsl
Engine name: Saxon-HE 9.3.0.5
Severity: fatal
Description: Cannot write more than one result document to the same URI: file:/c:/pelssers/demo/export/basictypes/PH3330L.xml
Start location: 27:0
URL: http://www.w3.org/TR/xslt20/#err-XTDE1490
- Second difficulty is that they are not identifiable with the same xpath-expression so to use 1 single group-by declaration for this heterogeneous bunch of elements needed a bit of thinking. I had to resort to a "Generic" function that would delegate to matching templates for the specific type of element.
manifest.xml
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<file href="basictypes.xml"/>
<file href="packages.xml"/>
</manifest>
basictypes.xml
<?xml version="1.0" encoding="UTF-8"?>
<basictypes>
<basictype identifier="PH3330L">
<description>N-channel TrenchMOS logic level FET</description>
<magcode>R73</magcode>
</basictype>
<basictype identifier="BUK3F00-50WDFE">
<description>9675 AUTO IC (IMPULSE)</description>
<magcode>R73</magcode>
</basictype>
<basictype identifier="PH3330L">
<description>this is a duplicate of PH3330L</description>
<magcode>R73</magcode>
</basictype>
</basictypes>
packages.xml
<?xml version="1.0" encoding="UTF-8"?>
<packages>
<package id="SOT669">
<description>plastic single-ended surface-mounted package; 4 leads</description>
<name>LFPAK; Power-SO8</name>
</package>
<package id="SOT600-1">
<description>plastic thin fine-pitch ball grid array package;</description>
<name>TFBGA208</name>
</package>
</packages>
In the XSLT below I first chose a grouping strategy to resolve the error of writing duplicate items to the same URI. Next I had to use a abstract function getURI for all element cases (basictype and package) which delegates the call to matching templates of @mode="getURI". I only use @mode="write" for the first element in each group and use @mode="skip" for all subsequent elements of that group. For this purpose I only log a messsage that i'm skipping them but I could also have implemented that handler differently like writing them to another folder. Only thing I would have to make sure of would be to include some unique identifiable part in the URI. I could e.g. use generate-id().
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:pelssers="http://robbypelssers.blogspot.com"
version="2.0">
<xsl:param name="sourceFolder" select="xs:anyURI('file:///c:/pelssers/demo/')"/>
<xsl:param name="destinationFolder" select="xs:anyURI('file:///c:/pelssers/demo/export/')"/>
<xsl:function name="pelssers:getURI" as="xs:anyURI">
<xsl:param name="element" as="element()"/>
<xsl:apply-templates select="$element" mode="getURI"/>
</xsl:function>
<xsl:template match="/">
<xsl:variable name="elements" select="for $doc in (for $href in manifest/file/@href return document(xs:anyURI(concat($sourceFolder, $href))) ) return $doc/*/*"/>
<xsl:for-each-group select="$elements" group-by="pelssers:getURI(.)">
<xsl:apply-templates select="current-group()[1]" mode="write"/>
<xsl:apply-templates select="subsequence(current-group(), 2)" mode="skip"/>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="basictype | package" mode="write">
<xsl:variable name="uri" select="pelssers:getURI(.)"/>
<xsl:message>Processing <xsl:value-of select="local-name()"/> to URI <xsl:value-of select="$uri"/> </xsl:message>
<xsl:result-document method="xml" href="{$uri}">
<xsl:element name="{../local-name()}">
<xsl:apply-templates select="../@*"/>
<xsl:copy-of select="."/>
</xsl:element>
</xsl:result-document>
</xsl:template>
<xsl:template match="basictype | package" mode="skip">
<xsl:variable name="uri" select="pelssers:getURI(.)"/>
<xsl:message>Warning !! Skipping duplicate <xsl:value-of select="local-name()"/> with URI <xsl:value-of select="$uri"/> </xsl:message>
</xsl:template>
<xsl:template match="basictype" as="xs:anyURI" mode="getURI">
<xsl:sequence select="xs:anyURI(concat($destinationFolder, 'basictypes/', @identifier, '.xml'))"/>
</xsl:template>
<xsl:template match="package" as="xs:anyURI" mode="getURI">
<xsl:sequence select="xs:anyURI(concat($destinationFolder, 'packages/', @id, '.xml'))"/>
</xsl:template>
</xsl:stylesheet>
The output of running this transformation nicely reports what's happening.
[Saxon-HE] Processing basictype to URI file:///c:/pelssers/demo/export/basictypes/PH3330L.xml
[Saxon-HE] Warning !! Skipping duplicate basictype with URI file:///c:/pelssers/demo/export/basictypes/PH3330L.xml
[Saxon-HE] Processing basictype to URI file:///c:/pelssers/demo/export/basictypes/BUK3F00-50WDFE.xml
[Saxon-HE] Processing package to URI file:///c:/pelssers/demo/export/packages/SOT669.xml
[Saxon-HE] Processing package to URI file:///c:/pelssers/demo/export/packages/SOT600-1.xml