Making a Profile of a Large Schema with Schematron: Sugar-Free XSD

Posted on April 19, 2017 by Rick Jelliffe

What do you do if you have to use an excessively large schema, but you only need parts of it, and you don’t want to open yourself up to a continuing maintenance problem, of updating your derived schema every time the base schema changes?

Here is an example of one thing you can do: make a Schematron schema to enforce a profile. You validate your document against the full schema, but also against the Schematron schema to point out if you have used any parts you didn’t want to use.

I thought I should put up an example of this for people to see. In this example, I am creating a fully-resolved subset of W3C XML Schemas I call Sugar-Free XSD: none of the labyrinth of cross references and gotchas. I wanted to make a stripped down version of XSD (potentially) provide a better pivot format for my old XSD-to-Schematron converter (now on Github). So it is like DTDs with no entities, but with data types, keys, local elements and assertions.

So here goes.

Validator for Sugar-Free XSD

Sugar-Free XSD is a subset of W3C XML Schemas (XSD) that attempts to do for XSD what XML did for SGML: create a distribution-oriented subset that is easier to parse at the expense of features designed to make the document terser, complexified, or easier to maintain. The Sugar-Free XSD schema may provide an easier target for XSD implementers, and to some extent corresponds to an entity-free DTD with richer typing and keys. This Schematron schema checks that an XSD document is limited to Sugar-Free XSD. In particular:

• All Sugar-Free XSD schema documents are valid XSD schema documents. Sugar-Free XSD is a profile of XSD.

• All Sugar-Free XSD schema can be automatically generated from an XSD Schema document, and this conversion may suppress unwanted XSD 1.1 features at user option.

• All documents invalid against a Sugar-Free XSD schema are also invalid against the original XSD schema. Sugar-Free XSD should never generate false negatives, but the original full XSD may reject documents for example for assertion violations.

• It should be a single file. Multiple files cause deployment problems.

• It should use a standard namespacing convention, with a single prefix. Variation is unnecessarily confusing and a fixed prefix make processing easier.

• Everything that would cause cross-referencing should be resolved out. So no top-level type or attribute declarations. The sole exception is top-level element declarations.

• No assumption of a Post-Validation Schema Infoset. However, the Sugar-Free XSD schema may still carry around default value information to inform post-processors.

There are two ways Sugar-Free XSD could be represented: by creating a subset schema, or by first validating the document against standard XSD and then validating against this schema.

As a speed-up technique, an optional attribute may be used on SimpleTypes ssf:sameAs with an identifier. A Sugar-Free XSD schema validator can assume that the contents of each element with the same ssf:sameAs identifier are the the same, and potentially memoize the implementation. This will not reduce file size but, potentially, compilation or interpretation speed.

  1. In a Sugar-Free XSD schema document, the top-level element must have the name “xs:schema”.
  2. In a Sugar-Free XSD schema document, only “xs:element” or “xs:annotation” elements are allowed under the top “xs:schema” element.
  3. In a Sugar-Free XSD schema, the prefix ‘xs:’ must be used for declarations.
  4. Any namespace except xs cam be used under an annotation.
  5. The xs:restriction element may be used for simple types, not for complex content.
  6. In a Sugar-Free XSD schema document, all elements must be in the standard XSD namespace.
  7. In a Sugar-Free XSD schema document, there should be no attributes in the “vc” version control namespace.
  8. In a Sugar-Free XSD schema document, only the built-in datatypes of XSD can be used as base types and must use the “xs” prefix.
  9. In a Sugar-Free XSD schema document, only the built-in datatypes of XSD can be used as base types and must use the “xs” prefix.
  1. In a Sugar-Free XSD schema document, there should be none of the following elements in any position: xs:extension, xs:import, xs:include, xs:override, xs:redefine, xs:defaultOpenContent
  2. In a Sugar-Free XSD schema document, the following attributes should never be used: @abstract, @block, @final, @fixed, @itemType, @memberType, @nillable, @ref, @substitutionGroup except for xs:keyref/@ref and xs:element/@ref
  1. The sfx:sameAs attribute can only appear on simpleType or complexType elements.
  2. The non-annotation contents of all element with sfx:sameAs must be equivalent.

And here is the Schematron schema. Actually, the text above was just created by trivally formating the Schematron schema. And the Schematron schema was created by first listing the assertions texts, then marking it up in XML and adding the XPaths.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2" schemaVersion="0.1"> 

<!-- Rick Jelliffe 2017 Public Domain -->

<ns prefix="xs" uri="http://www.w3.org/2001/XMLSchema"/>
<ns prefix="vc" uri="http://www.w3.org/2007/XMLSchema-versioning"/>
<ns prefix="sfx" uri="http://www.schematron.com/ns/SugarFreeXSD"/>

 <title>Validator for Sugar-Free XSD</title>

    <p><emph>Sugar-Free XSD</emph> is a subset of W3C XML Schemas (XSD) that attempts to do for XSD
        what XML did for SGML: create a distribution-oriented subset that is easier to parse at the
        expense of features designed to make the document terser, complexified, or easier to
        maintain. The Sugar-Free XSD schema may provide an easier target for XSD implementers, and
        to some extent corresponds to an entity-free DTD with richer typing and keys. This
        Schematron schema checks that an XSD document is limited to Sugar-Free XSD. In
        particular:</p>

    <p>• All Sugar-Free XSD schema documents are valid XSD schema documents. Sugar-Free XSD is a
        profile of XSD.</p>
    <p>• All Sugar-Free XSD schema can be automatically generated from an XSD Schema document, and
        this conversion may suppress unwanted XSD 1.1 features at user option.</p>
    <p>• All documents invalid against a Sugar-Free XSD schema are also invalid against the original
        XSD schema. Sugar-Free XSD should never generate false negatives, but the original full XSD
        may reject documents for example for assertion violations.</p>

    <p>• It should be a single file. Multiple files cause deployment problems. </p>
    <p>• It should use a standard namespacing convention, with a single prefix. Variation is
        unnecesarily confusing and a fixed prefix make processing easier.</p>
    <p>• Everything that would cause cross-referencing should be resolved out. So no top-level type
        or attribute declarations. The sole exception is top-level element declarations.</p>
    <p>• No assumption of a Post-Validation Schema Infoset. However, the Sugar-Free XSD schema may
        still carry around default value information to inform post-processors.</p>


    <p>There are two ways Sugar-Free XSD could be represented: by creating a subset schema, or by
        first validating the document against standard XSD and then validating against this schema. </p>

    <p>As a speed-up technique, an optional attribute may be used on SimpleTypes ssf:sameAs with an
        identifier. A Sugar-Free XSD schema validator can assume that the contents of each element
        with the same ssf:sameAs identifier are the the same, and potentially memoize the
        implementation. This will not reduce file size but, potentially, compilation or
        interpretation speed. </p>

<pattern id="sugar-free">
<rule context="/">
<assert test="xs:schema"> In a Sugar-Free XSD schema document, the top-level element
must have the name "xs:schema". </assert> </rule> <rule context="/xs:schema/*[not(self::xs:element or self::xs:annotation or self::xs:notation)]">
<assert test="false()" diagnostics="report-element"> In a Sugar-Free XSD schema document,
only "xs:element" or "xs:annotation" elements are allowed under the top "xs:schema"
element. </assert>
</rule>
<rule context="/xs:schema">
<assert test="@elementFormDefault='qualified'"> In a Sugar-Free XSD schema, the prefix
'xs:' must be used for declarations. </assert>
</rule>
<rule context="xs:appinfo//* | xs:documentation//*">
<assert test="not(starts-with( name(), 'xs:'))" diagnostics="report-element"> Any
namespace except xs cam be used under an annotation.</assert>
</rule>

<rule context="xs:restriction">
<assert test="ancestor::xs:simpleType"> The xs:restriction element may be used for
simple types, not for complex content. </assert>
</rule>
<rule context="*">
<assert test="starts-with( name(), 'xs:')" diagnostics="report-element"> In a Sugar-Free
XSD schema document, all elements must be in the standard XSD namespace. </assert>

<report test="@vc:*" diagnostics="report-element"> In a Sugar-Free XSD schema document,
there should be no attributes in the "vc" version control namespace. </report>

<assert test="not(@type) or starts-with( @type, 'xs:') " diagnostics="report-element">
In a Sugar-Free XSD schema document, only the built-in datatypes of XSD can be used
as base types and must use the "xs" prefix. </assert>
<assert test="not(@base) or starts-with( @base, 'xs:')" diagnostics="report-element"> In
a Sugar-Free XSD schema document, only the built-in datatypes of XSD can be used as
base types and must use the "xs" prefix. </assert>
</rule>

</pattern>

<pattern id="exclusions"> <rule context="xs:extension | xs:import | xs:include | xs:override
| xs:redefine | xs:defaultOpenContent">
<report test="true()" diagnostics="report-element"> In a Sugar-Free XSD schema document,
there should be none of the following elements in any position: xs:extension,
xs:import, xs:include, xs:override, xs:redefine, xs:defaultOpenContent </report>

</rule>
<rule context="*"> <report test=" @abstract or @block or @final or @fixed or @itemType or @memberType or @nillable or
@ref[not(parent::xs:keyref)][not(parent::xs:element)]  or @substitutionGroup" diagnostics="report-element"> In a Sugar-Free XSD schema document, the following
attributes should never be used: @abstract, @block, @final, @fixed, @itemType,
@memberType, @nillable, @ref, @substitutionGroup except for xs:keyref/@ref and xs:element/@ref </report>

</rule>

</pattern>

<pattern id="sameAs">

<rule context="*[@sfx:sameAs]">
<assert test="self::xs:simpleType or self::xs:complexType"> The sfx:sameAs attribute can
only appear on simpleType or complexType elements. </assert>
<assert test="true()"> The non-annotation contents of all element with sfx:sameAs must
be equivalent. </assert>
</rule>

</pattern>

<diagnostics>
<diagnostic id="report-element"> Problem at element "<name/>" <value-of select="@name"/> <value-of select="@id"/>
</diagnostic>

</diagnostics>

</schema>