So what were the lessons learned during this project?
- Schematron could test all the constraints we attempted to solve. There was no lack of power.
- However, it was extremely difficult to capture the semantics of XSD in order to transform it to Schematron. XSLT 2 was not really up to the job: many times decorating a parse tree was the most obvious way to do something, which XSLT 2's functional model (no side-effects) did not support. XSLT 3 may have something better with maps etc.
- This dual difficulty of interpreting the XSD specification or requirements, then implementing, was extremely unpleasant for the long-suffering developer who worked with me on this project.
- Some things that I expected to be difficult, in particular, validating content models,
turned out to be trivially simple. The trick was to turn the content model into a
regular expression, then generate a list of the
./*/name()
in question, and match this string against the regular expression. However, it was not possible to get good specific diagnostics this way. So a few alternative methods were used: an implementation could pick the most specific diagnostics from the SVRL, I suppose. - Indeed, this was a general problem. XSD, like all the grammar schema languages, is very keen on letting you put constraints in, and to a lesser extent to document what an element does, but utterly disinterested in capturing information about why some constraint exists: why cannot you have element A after element B, and so on. Without this, even the good, clear, highly specific diagnostic messages Schematron could generate were impoverished compared to usual Schematron messages.
- The validation was very slow. Fatally slow? The code would need to checked for optimizations: for example, more use of xsl:key(). XSDs can be very large, and the Schematron schemas can also be large. Consequently, I suspected that a real implementation would, with the Schematron techniques in mind, be better off converting direct from XSD to XSLT.
- Some of the features we did not need to implement would have been no problem for Schematron: the key ref mechanism uses XPaths. And the XSD 2 provision of an assertion mechanism with XPaths would be relatively trivial too.
All-in-all, I left the project with several new techniques, and a better sense (with code that demonstrated it) that fewer of the constraints that XSD validates are out of Schematron's reach.