An Overview of Schematron

Power

Schematron provides unprecedented power to reach tie information inside an XML document together and make sure that patterns that you need are present.It supports

  • Natural Language Assertions—express constraints in terms that domain experts or users will understand, not necessarily in the particular names of element markup or the cryptic messages of developers
  • Graph Constraints—any addressable structure can be tested from any addressable location in the document
  • Web Technology—Schematron can pull in documents or information available from web servers.
  • Progressive Validation—divide the validation into phases to support a particular constraints, workflow or document variants, and then run these phases in the order you select
  • Validation as Transformation—using standard XPath and XSLT tools, Schematron provides more useful power than any other standard schema language, while only requiring a simple codebase.
  • XML Tool Chain—Schematron generates an XML Schematron Validation Reporting Language document, it integrates readily into most existing XML tool chains or pipelines.
  • Summarizing, Grading and Feature Extraction—the flipside of validation, which tells you if expected patterns in the document is incomplete, reporting tells you about the patterns that are detected in the document
  • Consumer-Driven Contracts— clear, stable expressions of service requirements from providers in response to consumers

Concept

Schematron differs in basic concept from other schema languages in that it not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which are inconvenient and difficult in grammar-based schema languages. If you know XPath or the XSLT expression language, you can start to use Schematron immediately.

And it has free and open source implementations available. Schematron is trivially simple to implement on top of XSLT and to customize. (There are also implementations in Python and Perl)

Schematron is a feather duster to reach the corners that other schema languages cannot reach

Operation

Schematron allows you to develop and mix two kinds of constraints:

  • Report elements allow you to diagnose which variant of a language you are dealing with.
  • Assert elements allow you to confirm that the document conforms to a particular schema.

Schematron is based on a simple action:

  • First, find useful context nodes in the document (typically an element) based on XPath path criteria;
  • Then, check to see if some other XPath expressions are true, for each of those context nodes.
  • Finally, report which assertions have failed (and which report have succeeded) to give users the targeted information they need.

Schematron can be useful in conjunction with many grammar-based structure-validation languages: DTDs, XML Schemas, RELAX, TREX, etc. Indeed, Schematron is part of an ISO standard (DSDL: Document Schema Description Languages) designed to allow multiple, well-focussed XML validation languages to work together. You can even embed a Schematron schema inside an XML Schema <appinfo> element or inside a RELAX NG schema!

6 Main Elements

There are only 6 basic elements in ISO Schematron which makes it very easy to learn, especially if you already know XPaths. (There are others, but these mainly just help construct nice user interfaces for validators.)

Here is the basic structure

  • <schema xmlns="http://purl.oclc.org/dsdl/schematron"> contains
    • optional <title> then
    • zero or more <ns prefix="PPP" uri="UUU" /> giving the namespaces and prefixes used for the XPaths, then
    • one or more <pattern>, which each contain
      • one or more <rule context="CCC"> where the context attribute is an XSLT expression, which contain mixed
        • <assert test="TTT"> where the test attribute is an XPath location, and which contains rich text expressing the statement being asserted in plain language, and
        • <report test="TTT"> where the testattribute is an XPath location, and which contains rich text expressing the fact to be reported in plain language.

Example

So here is a very small example. It is a mini-schema for Schematron.

<schema xmlns="http://purl.oclc.org/dsdl/schematron">
 <title>A Schematron Mini-Schema for Schematron</title>
 <ns prefix="sch" uri="http://purl.oclc.org/dsdl/schematron">
 <pattern>
   <rule context="sch:schema">
     <assert test="sch:pattern"
     >A schema contains patterns.</assert>
     <assert test="sch:pattern/sch:rule[@context]"
     >A pattern is composed of rules.
     These rules should have context attributes.</assert>
     <assert test="sch:pattern/sch:rule/sch:assert[@test] 
or sch:pattern/sch:rule/sch:report[@test]"
     >A rule is composed of assert and report statements. 
These rules should have a test attribute.</assert>    </rule>
 </pattern>

</schema>

In that mini-schema, the rule element sets the context: the rule applies to any sch:schema element in a document. The rules say that there must be at least one child element sch:pattern, at least one child element sch:pattern with a child sch:rule with a context attribute, and at least one child element sch:pattern with a child sch:rule with a sch:assert or sch:reportwith a test attribute.