Wrap Up

So what were the lessons learned during this project?

  • Schematron could test all the constraints we attempted to solve.  There was no lack of power.
  • However, it was extremely difficult to capture the semantics of XSD in order to transform it to Schematron.  XSLT 2 was not really up to the job: many times decorating a parse tree was the most obvious way to do something, which XSLT 2's functional model (no side-effects) did not support.  XSLT 3 may have something better with maps etc. 
  • This dual difficulty of interpreting the XSD specification or requirements, then implementing, was extremely unpleasant for the long-suffering developer who worked with me on this project.
  • Some things that I expected to be difficult, in particular, validating content models, turned out to be trivially simple.  The trick was to turn the content model into a regular expression, then generate a list of the  ./*/name()  in question, and match this string against the regular expression.  However, it was not possible to get good specific diagnostics this way. So a few alternative methods were used: an implementation could pick the most specific diagnostics from the SVRL, I suppose.
  • Indeed, this was a general problem.  XSD, like all the grammar schema languages, is very keen on letting you put constraints in, and to a lesser extent to document what an element does, but utterly disinterested in capturing information about why some constraint exists: why cannot you have element A after element B, and so on. Without this, even the good,  clear, highly specific diagnostic messages Schematron could generate were impoverished compared to usual Schematron messages.
  • The validation was very slow. Fatally slow?  The code would need to checked for optimizations: for example, more use of xsl:key().  XSDs can be very large, and the Schematron schemas can also be large. Consequently, I suspected that a real implementation would, with the Schematron techniques in mind, be better off converting direct from XSD to XSLT. 
  • Some of the features we did not need to implement would have been no problem for Schematron: the key ref mechanism uses XPaths.  And the XSD 2 provision of an assertion mechanism with XPaths would be relatively trivial too.

All-in-all, I left the project with several new techniques, and a better sense (with code that demonstrated it) that fewer of the constraints that XSD validates are out of Schematron's reach.