Converting XML Schemas to Schematron: (#10) Required pairs in a sequence

This article first appeared in a blog on O'Reilly on January 30, 2008.

What we want to do is to have a Schematron pattern that just checks a very specific thing: when the use in a document of one element requires that another element immediate follows it.

Actually, I am skipping over a stage here, because this code is quite small, fun and instructive. Which is perhaps another way of saying and the code we are skipping over (for now) is quite complex. The stage we are skipping over for now has assertions to test partial order (like Topologi’s and James Clark’s RELAX NG validator JIng’s feasible validation mode: it passes any element which could go after the current element (in its parents) not just the element that can immediately follow it. Having the test for partial order is useful for progressive validation (for example for feasible validation where we have a document that we know is incomplete, but we just want to know if it is OK as far as it goes) but more importantly it lets us divide and conquer our task.

Back to our simple case… The XML Schemas schema for this is when there is a xs:sequence element, which contains two consecutive xs:element particles, with occurrence constraints set so that the first cannot repeat while the second is required.

First here is the kind of code we will have in our Schematron schema:

<sch:pattern id="Required_Immediate_Followers">
      <sch:title>Required Immediate Followers (Simple)

      <sch:rule context="Address/StreetOrPOBox">
         <sch:assert test="following-sibling::*[1][self::Suburb]">
                When in a "Address" element, the element "StreetOrPOBox" should be immediately followed by
                 the element "Suburb". </sch:assert>

And here is the beta XSLT code to generate it from our (expanded and munged) XML schema:

<xsl:template name="generate-immediate-following-elements-checking-rule">

        <xsl:for-each select="//xs:element
                        [not(@maxOccurs='unbounded') and not(@maxOccurs > 1) and not(@maxOccurs=0)]
                        [@minOccurs='unbounded' or not(@minOccurs=0)]
                                        [@maxOccurs='unbounded' or not(@maxOccurs=0)]
                                        [@minOccurs='unbounded' or not(@minOccurs=0)]]]">
                                <!--  Store the name of the parent element -->
                <xsl:variable name="parent-element-name" select="ancestor::xs:element[1]/@name"/>
                <xsl:variable name="parent-element" select="ancestor::xs:element[1]"/>
                <!--  Store the context path -->
                <xsl:variable name="path-to-parent">
                        <xsl:for-each select="ancestor::xs:element"><xsl:value-of select="@name"/>/</xsl:for-each>

                        <sch:rule context="{concat($path-to-parent, (@name | @ref))}">
                                <sch:assert diagnostics="unexpected-immediate-follower">
                                        <xsl:attribute name="test">following-sibling::*[1][self::<xsl:value-of
                                        select="concat(following-sibling::*[1]/@name, following-sibling::*[1]/@ref)"/>]
                                        When in a "<xsl:value-of select=" $parent-element-name" />" element,
                                       the element "<xsl:value-of select="concat(@ref, @name)"/>" should be
                                       immediately followed by  the element  "<xsl:value-of
                                      select="concat(following-sibling::*[1]/@name, following-sibling::*[1]/@ref)"/>".



One thing to note is the variable path-to-parent: we will see this used again later. It allows us to have local declarations as deep as we need. Another thing to note is that whenever we test the XML Schemas attribute maxOccurs and minOccurs we first have to do a string test for “unbounded” (or a test using number()) because they have a union data type allowing numbers and “unbounded”.

Looking at this code I see an immediate potential flaw: in XPath 1.0 you would only need to check the maxOccurs and minOccurs attributes for numeric values: the tests would gracefully fail if “unbounded” was used in the original schema. However, XPath 2.0 will generate a type error, so we put the test for string first (the attribute value will be first tested as a string, then as a number). This relies on shortcircuiting: the success of the first test means the second test is not evaluated. But, oh dear, shortcircuiting is not guaranteed in XPath 2.0 (it is XPath 1.0 behaviour.) So I will have to make these tests into little if ... then... expressions. This is one place XLST 2.0 really gets it wrong, it should add the short-circuiting constraint because it makes life sooo much easier for programmers. I am enjoying exploring XSLT 2, but this is thing is just dumb and un-idiomatic. If it ain’t broke don’t fix it, and so on. (Having said all that, SAXON acts the way I want here, and short-circuits or at least does not freak out. Keen readers: please let me know if my understanding it wrong here!)

This simple test actually handles a lot of the required constraints in content models, and obviously it can be improved on: for example, when the first element can repeat, the assertion needs to be broadened to allow it to follow itself. Or what the second particle is another sequence, or a choice? Or what if the second particle is optional? And what if the same particle appears several times in the content model? (See my initial article on this  Converting Content Models to Schematron for some ideas.)

However, it does not generate false negatives, which is what we want as we create our finer sieve.