This article first appeared in a blog on O'Reilly on January 30, 2008.
What we want to do is to have a Schematron pattern that just checks a very specific thing: when the use in a document of one element requires that another element immediate follows it.
Actually, I am skipping over a stage here, because this code is quite small, fun and instructive. Which is perhaps another way of saying and the code we are skipping over (for now) is quite complex. The stage we are skipping over for now has assertions to test partial order (like Topologi’s and James Clark’s RELAX NG validator JIng’s feasible validation mode: it passes any element which could go after the current element (in its parents) not just the element that can immediately follow it. Having the test for partial order is useful for progressive validation (for example for feasible validation where we have a document that we know is incomplete, but we just want to know if it is OK as far as it goes) but more importantly it lets us divide and conquer our task.
Back to our simple case… The XML Schemas schema for this is when there is a xs:sequence
element, which contains two consecutive xs:element
particles, with occurrence constraints set so that the first cannot repeat while
the second is required.
First here is the kind of code we will have in our Schematron schema:
<sch:pattern id="Required_Immediate_Followers">
<sch:title>Required Immediate Followers (Simple)
<sch:rule context="Address/StreetOrPOBox">
<sch:assert test="following-sibling::*[1][self::Suburb]">
When in a "Address" element, the element "StreetOrPOBox" should be immediately followed by
the element "Suburb". </sch:assert>
</sch:rule>
...
</pattern>
And here is the beta XSLT code to generate it from our (expanded and munged) XML schema:
<xsl:template name="generate-immediate-following-elements-checking-rule">
<xsl:for-each select="//xs:element
[not(@maxOccurs='unbounded') and not(@maxOccurs > 1) and not(@maxOccurs=0)]
[@minOccurs='unbounded' or not(@minOccurs=0)]
[parent::xs:sequence]
[following-sibling::*
[self::xs:element
[@maxOccurs='unbounded' or not(@maxOccurs=0)]
[@minOccurs='unbounded' or not(@minOccurs=0)]]]">
<!-- Store the name of the parent element -->
<xsl:variable name="parent-element-name" select="ancestor::xs:element[1]/@name"/>
<xsl:variable name="parent-element" select="ancestor::xs:element[1]"/>
<!-- Store the context path -->
<xsl:variable name="path-to-parent">
<xsl:for-each select="ancestor::xs:element"><xsl:value-of select="@name"/>/</xsl:for-each>
</xsl:variable>
<sch:rule context="{concat($path-to-parent, (@name | @ref))}">
<sch:assert diagnostics="unexpected-immediate-follower">
<xsl:attribute name="test">following-sibling::*[1][self::<xsl:value-of
select="concat(following-sibling::*[1]/@name, following-sibling::*[1]/@ref)"/>]
</xsl:attribute>
When in a "<xsl:value-of select=" $parent-element-name" />" element,
the element "<xsl:value-of select="concat(@ref, @name)"/>" should be
immediately followed by the element "<xsl:value-of
select="concat(following-sibling::*[1]/@name, following-sibling::*[1]/@ref)"/>".
</sch:assert>
</sch:rule>
</xsl:for-each>
</xsl:template>
One thing to note is the variable path-to-parent
: we will see this used again later. It allows us to have local declarations as deep
as we need. Another thing to note is that whenever we test the XML Schemas attribute
maxOccurs
and minOccurs
we first have to do a string test for “unbounded” (or a test using number()
) because they have a union data type allowing numbers and “unbounded”.
Looking at this code I see an immediate potential flaw: in XPath 1.0 you would only
need to check the maxOccurs
and minOccurs
attributes for numeric values: the tests would gracefully fail if “unbounded” was
used in the original schema. However, XPath 2.0 will generate a type error, so we
put the test for string first (the attribute value will be first tested as a string,
then as a number). This relies on shortcircuiting: the success of the first test means
the second test is not evaluated. But, oh dear, shortcircuiting is not guaranteed
in XPath 2.0 (it is XPath 1.0 behaviour.) So I will have to make these tests into
little if ... then...
expressions. This is one place XLST 2.0 really gets it wrong, it should add the short-circuiting
constraint because it makes life sooo much easier for programmers. I am enjoying exploring
XSLT 2, but this is thing is just dumb and un-idiomatic. If it ain’t broke don’t fix
it, and so on. (Having said all that, SAXON acts the way I want here, and short-circuits
or at least does not freak out. Keen readers: please let me know if my understanding
it wrong here!)
This simple test actually handles a lot of the required constraints in content models, and obviously it can be improved on: for example, when the first element can repeat, the assertion needs to be broadened to allow it to follow itself. Or what the second particle is another sequence, or a choice? Or what if the second particle is optional? And what if the same particle appears several times in the content model? (See my initial article on this Converting Content Models to Schematron for some ideas.)
However, it does not generate false negatives, which is what we want as we create our finer sieve.