If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Friday, June 15, 2012

XSLT puzzler: removing preceding deep equal elements

Goal: remove duplicate (deep-equal) items from the tree. I made a small modification. Instead of using xsl:value-of I switched to xsl:sequence inside the functions. I got spanked on the butt by Andrew Welch. But I really like the XSLT mailinglist. It's one of the most responsive communities i've seen to be honest. As long as you isolate your problem and formulate the desired solution clearly they come up with working solutions within a day.
A tip here - when the sequence type (as attribute) of the function is
an atomic, always use xsl:sequence instead of xsl:value-of.

When you use value-of you create a text node, which then gets atomized
to the sequence type, so you can avoid that unnecessary step by using
xsl:sequence which will return an atomic.

The general rule is 'always use xsl:sequence in xsl:functions', as you
pretty much always return atomics from functions.

(that's also a good interview question "what's the difference between
xsl:sequence and xsl:value-of)


--
Andrew Welch
http://andrewjwelch.com


Input XML
<?xml version="1.0" encoding="UTF-8"?>
<myroot>
    <RNC>
        <nodeA id="a">
            <section id="1">
                <item1 id="0" method="delete"/>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="2">
                <!-- second consecutive delete we remove this -->
                <item1 id="0" method="delete"/>
                <!-- third consecutive delete BUT children have different value , so we don't remove this -->
                <item1 id="0" method="delete">
                    <somechild>bbb</somechild>
                </item1>
                <item1 id="3" method="create">
                    <other>xx</other>
                </item1>
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
                <!-- second consecutive create, we remove this -->
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="3">
                <!-- third consecutive create, we remove this -->
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
        </nodeA>
    </RNC>
</myroot>

Desired output XML
<?xml version="1.0" encoding="UTF-8"?>
<myroot>
    <RNC>
        <nodeA id="a">
            <section id="1">
                <item1 id="0" method="delete"/>
                <item1 id="1" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="2">
                <!-- third consecutive delete BUT children have different value , so we don't remove this -->
                <item1 id="0" method="delete">
                    <somechild>bbb</somechild>
                </item1>
                <item1 id="3" method="create">
                    <other>xx</other>
                </item1>
                <item1 id="0" method="create">
                    <otherchild>a</otherchild>
                </item1>
            </section>
            <section id="3"/>
        </nodeA>
    </RNC>
</myroot>

My solution using functions:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:custom="www.company.com">
    
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:function name="custom:equalPrecedingItemCount" as="xs:integer">
      <xsl:param name="preceding_items"/>
      <xsl:param name="this_item"/> 
      <xsl:sequence select="sum(for $item in $preceding_items return custom:getEqualityValue($item, $this_item))"/>
    </xsl:function>
    
    <xsl:function name="custom:getEqualityValue" as="xs:integer">
      <xsl:param name="item1"/>
      <xsl:param name="item2"/> 
      <xsl:sequence select="if (deep-equal($item1, $item2)) then 1 else 0"/>
    </xsl:function>

    <xsl:template match="node()|@*">
      <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
    </xsl:template>

    <!-- we skip the item if it has preceding items which are equal -->
    <xsl:template match="item1[custom:equalPrecedingItemCount(preceding::item1, .) > 0]"/>

</xsl:stylesheet>

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Well, it's still unclear why we should keep item1 with id="1" in the section 3. We already have a duplicate in the section 1 (with the same method and the child node)!

    BTW, your solution works in exactly the same way as the one suggested as incorrect by the author:

    <xsl:template match="RNC/*/*/*
    [deep-equal(.,
    preceding::*[name()=current()/name()]
    [@id = current()/@id]
    [../../@id = current()/../../@id]
    [1])]" >
    </xsl:template>

    Both this and your solution remove all items from section 3.

    ReplyDelete
  3. Hi Ivan,

    the desired output example was wrong. I actually copied it verbatim from the XSLT mailinglist but apparently that was incorrect. Whereas the template you posted might accomplish the same to me that seems pretty unclear in what it does. I guess it's a matter of taste ;-)

    ReplyDelete