Converting XML Schemas to Schematron: (#5) Validating your own derived simple types

This article appeared in the O'Reilly blog on October 30, 2007 

XSD allows you to derive your own simple datatypes by restricting the lexical space or the value space of the type. The rule about derivation by restriction is that everything that is valid against the derived type is also valid against the base type.

And this gives us our method. Remember from the previous blog in this series that we implement a built-in datatype like this

<sch:rule context="imametadataman">
     <sch:rule extends="xsd-byte-datatype"/>
   </sch:rule>

If we want to say that imametadataman should have a facet of minExclusive of 32, we just implement the facet restriction by adding an assertion:

<sch:rule context="imametadataman">
     <sch:rule extends="xsd-byte-datatype"/>
    <sch:assert test=". > 32 "> The value for <sh:name />should be greater than 32 </sch:assert>
   </sch:rule>

Type derivation by restriction can be directly implemented by Schematron abstract rules. There is a mismatch in terminology: we restrict the type (in the XSD) by extending the constraints (in Schematron).

Here is some code to give the flavour of how easy it is to handle each type. (The assertion text needs work, and there is lots of scope for beautification, but you should get the idea. )

<xsl:when test="xs:simpleType/xs:restriction[@base]">
                        <xsl:variable name="baseon" select="xs:simpleType/xs:restriction/@base"/>
                        <sch:rule>
                                <xsl:choose>
                                        <xsl:when test="self::xs:attribute[parent::xs:schema]">
                                                <xsl:attribute name="abstract">true
                                                <xsl:attribute name="id">
                                                        <xsl:choose>
                                                                <!-- attribute has no namespace -->
                                                                <xsl:when test="ancestor::namespace/@uri=''">
                                                                        <xsl:value-of select="concat('global_', @name)"/>
                                                                </xsl:when>
                                                                <!-- attribute has namespace (normal case) -->
                                                                <xsl:otherwise>
                                                                        <xsl:value-of select="concat('global_', ancestor::namespace/@prefix, '_', @name)"/>
                                                                </xsl:otherwise>
                                                        </xsl:choose>
                                                </xsl:attribute>
                                        </xsl:when>
                                        <xsl:otherwise>
                                                <xsl:choose>
                                                        <xsl:when test="self::xs:element">
                                                                <xsl:call-template name="generate-element-context"/>
                                                        </xsl:when>
                                                        <xsl:otherwise>
                                                                <xsl:call-template name="generate-attribute-context"/>
                                                        </xsl:otherwise>
                                                </xsl:choose>
                                        </xsl:otherwise>
                                </xsl:choose>
                                <!-- get base value -->
                                <!--  FIX THIS: should use namespace URI not prefix! -->
                                <xsl:choose>
                                        <xsl:when test="starts-with($baseon,'xs:') or
                                                                        starts-with($baseon,'xsd:') or
                                                                        starts-with($baseon,'xsi:')">
                                                <sch:extends rule="{concat(ancestor::namespace/@prefix, '-xsd-datatype-', substring-after($baseon, ':'))}"/>
                                        </xsl:when>
                                        <xsl:when test="contains($baseon,':')">
                                                <xsl:variable name="prefix"
                                                        select="substring-before($baseon, ':')"/>
                                                <xsl:variable name="typename"
                                                        select="substring-after($baseon, ':')"/>
                                                <sch:extends rule="{concat($prefix, '_', $typename)}"/>
                                        </xsl:when>
                                        <xsl:otherwise>
                                                <sch:extends rule="{concat(ancestor::namespace/@prefix, '_', $baseon)}"/>
                                        </xsl:otherwise>
                                </xsl:choose>
                                <!-- check the underneath of restriction -->
                                <xsl:if test="xs:simpleType/xs:restriction/xs:enumeration">
                                        <sch:assert>
                                                <xsl:attribute name="test">
                                                        <xsl:for-each select="xs:simpleType/xs:restriction/xs:enumeration">
                                                                <xsl:text>(. = "
                                                                <xsl:value-of select="normalize-space(@value)"/>
                                                                <xsl:text>")
                                                                <xsl:if test="following-sibling::xs:enumeration">
                                                                        <xsl:text> or 
                                                                </xsl:if>
                                                        </xsl:for-each>
                                                </xsl:attribute> The value of  should be one of
                                                <ue-of select="@value"/>
                                                        <xsl:if test="following-sibling::xs:enumeration">
                                                                <xsl:text>, 
                                                        </xsl:if>
                                                </xsl:for-each>. (It is of type "
                                                <xsl:value-of select="normalize-space(@name)"/>".)
                                        </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:minLength">
                                        <sch:assert test="string-length(.) < xs:simpleType/xs:restriction/xs:minLength/@value"> A
                                                simpleType(
                                                <xsl:value-of select="@name"/>)'s value must be longer than
                                                <xsl:value-of select="xs:simpleType/xs:restriction/xs:minLength/@value"/> </sch:assert>
                                
                                
                                        <sch:assert test="string-length(.) > xs:simpleType/xs:restriction/xs:maxLength/@value"> A
                                                simpleType(
                                                <xsl:value-of select="@name"/>)'s value must be shorter than
                                                <xsl:value-of select="xs:simpleType/xs:restriction/xs:maxLength/@value"/> </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:length">
                                        <sch:assert test="string-length(.) != xs:simpleType/xs:restriction/xs:length/@value"> A length of
                                                this simpleType(
                                                <xsl:value-of select="@name"/>)'s value must be
                                                <xsl:value-of select="xs:simpleType/xs:restriction/xs:length/@value"/> </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:whiteSpace">
                                        <sch:assert test="true()"> WhiteSpace would be treated as 'preserve',
                                                'replace' or 'collapse' </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:totalDigits">
                                        <xsl:comment>The counting doesn't include dot, leading and trailing zeros.</xsl:comment>
                                        <sch:assert test="string-length(replace(string(.),'.','')) < xs:simpleType/xs:restriction/xs:totalDigits/@value"> The maximum number of digits for <sch:name/>
                                                should smaller than <xsl:value-of select="xs:simpleType/xs:restriction/xs:totalDigits/@value"/> </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:minExclusive">
                                        <sch:assert test=". > xs:simpleType/xs:restriction/xs:minExclusive/@value"> The value for  should be
                                                bigger than <xsl:value-of select="xs:simpleType/xs:restriction/xs:minExclusive/@value"/> 
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:minInclusive">
                                        <sch:assert test=". > xs:simpleType/xs:restriction/xs:minExclusive/@value or . = xs:simpleType/xs:restriction/xs:minExclusive/@value"> The value for <sch:name/> should be
                                                bigger than and equal with <xsl:value-of select="xs:simpleType/xs:restriction/xs:minExclusive/@value"/> </sch:assert>
                                <<xsl:if test="xs:simpleType/xs:restriction/xs:maxExclusive">
                                        <sch:assert test=". < xs:simpleType/xs:restriction/xs:maxExclusive/@value"> The value for  should be
                                                smaller than <xsl:value-of select="xs:simpleType/xs:restriction/xs:maxExclusive/@value"/> </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:maxInclusive">
                                        <sch:assert test=". < xs:simpleType/xs:restriction/xs:maxExclusive/@value or . = xs:simpleType/xs:restriction/xs:maxExclusive/@value"> The value for <sch:name/> should be
                                                smaller than and equal with <xsl:value-of select="xs:simpleType/xs:restriction/xs:maxExclusive/@value"/> </sch:assert>
                                </xsl:if>
                                <xsl:if test="xs:simpleType/xs:restriction/xs:pattern">
                                        <xsl:comment>This assertion check xs:pattern, xs:pattern could be more than one, but the value is valid when one of them is matched.
                                        <xsl:variable name="testString">
                                                <xsl:for-each select="xs:simpleType/xs:restriction/xs:pattern">
                                                        <xsl:variable name="apost" select='"'"'/>
                                                        <xsl:value-of select="concat('matches(.,', $apost,@value,$apost,')')"/>
                                                        <xsl:if test="position() != last()"> or </xsl:if>
                                                </xsl:for-each>
                                        </xsl:variable>
                                        <sch:assert>
                                                <xsl:attribute name="test">
                                                        <xsl:value-of select="$testString"/>
                                                </xsl:attribute> The value for  should match
                                                <xsl:choose>
                                                        <xsl:when test="count(xs:simpleType/xs:restriction/xs:pattern) = 1">
                                                                the pattern:
                                                        </xsl:when>
                                                        <xsl:otherwise>
                                                                one of patterns:
                                                        </xsl:otherwise>
                                                </xsl:choose>
                                                <xsl:for-each select="xs:simpleType/xs:restriction/xs:pattern">
                                                        <!-- HACK: This is strange to make span into a list value, but better than nothing -->
                                                        <sch:span class="li"><xsl:value-of select="@value"/></xsl:for-each>
                                        </sch:assert>
                                </xsl:if>
                        </sch:rule>
                </xsl:when>

We are not implementing simple type derivation by union or list at the moment, because it is outside our primary requirements. I expect derivation by list would benefit from XPath2’s extra power. Derivation by union needs more thought.

But at least this puts us in the position where I think (have I missed something? never impossible!) we can say that Schematron’s power to validate datatypes is strictly more power than XSDs power for datatypes derived by restriction; Schematron (i.e. using Xpath2) can express all the XSD constraints and more.

But is Schematron more powerful to model type derivation? We want to be able to draw pretty diagrams of type derivation. Well, actually because derivation by restriction is simply implemented by abstract rules, in fact Schematron is equally capable of modeling the derivation structure. And, if we add @role attributes to the assertions with the name of the facet being restricted, actually Schematron models the facet system too: to the extent that (if you know the particular conventions used) you could re-generate versions of the original XSD datatype declarations.

But is Schematron better for diagnostics? Well, here comes the rub. In fact, for the datatypes Schematron does not bring any great improvement, in itself, in the kinds of diagnostics that can be generated by an XSD system that was targeted at humans (does any exist?). It does potentially bring a lot more ease of customization (compared to compiled XSD validators, but this is a benefit of scripting), but basically it is just working with a fairly well-enumerated set of properties, in the facets. We will see that it has a lot more scope for smarter diagnostics when validating so-called complex content.

And, we are not necessarily restricted to even XPath2’s power. It is possible to use an extended version of the query language that invokes functions from the Java (or Eiffel or whatever) platform. But this goes beyond our modest scope of a fairly complete implementation of XSD in a handful of XSLT scripts!

Finally, there is a little potential wrinkle here that needs to be worked out. What if our value for imametadataman is -333: we will get an assertion failure both for the byte constraint and the >32 constraint. There is a danger that a multiply derived datatype will generate a flood of redundant error messages. There are two answers: one is to say “we already treat the built-in derived types as single abstract rules, so there won’t really be much multiple derivation with the same facet, its not a big problem!” Another answer is that the assertion for a facet restriction should only test the actual restricted range, and not any range for the base type. So the assertion for >32 also cops out for data >256 and leaves assertion failure for the base type’s abstract rule to provide. (I think this second approach is nicer.)