Book Review of Schematron: A Language for Validating XML by Erik Siegel

A lot has happened since Erik van der Vlist’s 2007 book Schematron (O'Reilly):  the ISO standard has been through two upgrades as XSLT 2 and XSLT 3 were released,  new elements for properties were added, the use of SVRL took off, XProc integration improved, the unit-testing framework XTest added Schematron support, automated document-repair techniques (notably XML QuickFix) have appeared, XML Schemas added Schematron-like XPath assertions (sans the natural language aspect) (which took the pressure off Schematron embedded in XSD), and David Maus released his alternate Schematron engine.  

Schematron continues to find strong adoption firstly in public sectors needing to exchange rich, high-value or high-risk text, notably health, homeland, and financial sectors, where it is a project-required choice; and secondly, as a secret weapon for successful implementations of large XML projects, where it is a developer-directed choice. Schematron’s approach of using XPaths to specify constraints has become ubiquitous, in such technologies as Cucumber, Selenium and PMD. Recently, developers have worked on applying it to JSON data exchange.

So Erik Siegel’s new (2022) book Schematron: A Language for Validating XML (XML Press) fills a vacuum, and fills it well.  At 260 pages, it aims to be complete and contemporary. Erik uses the 3rd Edition of the ISO Standard for Schematron as his basis. The book is solidly aimed at developers setting up Schematron validation systems, who will use XPath3 (i.e. Saxon) and probably use the Oxygen XML IDE: these are sensible calls in 2022.  If I were a developer starting off with Schematron, I would have a copy of this book open next to my PC for reference.  Every element and attribute is explained with clear text, and the examples are straightforward and well-focussed.

Especially welcome is the primer on XPath: by design, Schematron force a discipline or order to constraints (as a hierarchy of schema, phase, pattern, rule, assertion, diagnostic and property) that fights the tendency of programmers to prioritize functionality over discussability/maintainability: any leftover complexity for expressing constraints is the province of XPath: a good degree of comfort in XPath is the best background knowledge for developing good Schematron schemas for even quite complex constraints.

What you get in the box is what is on the label: this is a book about validating XML and is solidly pragmatic rather than theoretical. So you will not find any real treatment of software engineering aspects of Schematron: where it can fit in the SDLC, using Schematron for  feature extraction or uses other than validation, best practice for phrasing assertions, nor why Schematron prioritizes natural language.  The book on those aspects, which are surely the most distinctive or noel parts of Schematron, remains to be written.

Erik does not neglect the developer-directed scenario of Schematron, a scenario which benefits from flexible, ad hoc and expeditious tools: for example, Erik’s treatment of <xsl:function> will be welcome to developers in a pinch. But the book will also be useful for those in the project-required scenario, where you may have several teams working on different aspects of the schemas: subject matter experts deciding the assertion texts, developers working out the respective XPaths, and integrators working out how to run the schemas and make use of the information provided from validation.  

Pedantic and fusty readers may note with suprise the bad example that some of the assertion texts in the books do not conform to the ISO Standard, which specifies: The natural-language assertion shall be a positive statement of a text; some assertions texts instead are merely error messages like "The X is too big":  I recommend readers really take in Erik’s good discussion of assertion texts on pages 57 to 60: yes, a Schematron script that only has error messages (or no assertion text at all!) is simply not a “schema”; but no,  sometimes you do not need a schema, you just want to generate errors! 

Indeed, maybe it is a strength of Schematron that it can work in that quick-and-dirty scenario; consider a project with continuous process improvement, where we put in a Schematron validator in the pipeline as part of the furniture, even ahead of time, to allow constraints emerging from use and experience to be added immediately in an agile fashion by DevOps staff (who may not have good confidence in writing in the natural language of the main schema): we perhaps should consider the capture of the constraint as a win even with horrible assertion text (but schedule some review to improve the user-reporting aspect.) 

However, if you are working in the project-directed scenario, where you may have multiple parties and the rationale for adopting Schematron was  to allow clear knowledge-capture and a better separation of concerns and support of the SDLC, then rigour will make your lives easier. 

Aside: I cannot say that the occasional  poor Schematron schemas I have seen in the field were the result of developers being over-explanatory of the constraint, nor from using Schematron features such as variables<sch:let>   instead of complex, XQuery-like XPaths: instead, poor schemas tend towards under-using the pattern-rule distinction, and flipping  out to foreign functions (defined in XSLT or even Java) unnecessarily.

Hint: Developers who are used to XP techniques such as TDD and BDD may find Schematron a congenial framework for expressing their constraints in a declarative/intentional fashion: I tend to recommend this template for assertion texts: "An X should have/be Y because Z:  found unexpected X with value ABC"  where X is the context as understood by the notional user of the information (not necessarily the developer, and perhaps having a view of the data different to the XML’s names and structure), Y is the range of allowed values (again from the user’s POV), Z is the (optional) business motivation or requirement; the second part can have dynamic data giving the specifics of what was actually found. 

Congratulations to Erik on this excellent book, which fills a real gap well!

- Rick Jelliffe 

(Note: Eric kindly sent me a late draft for comments, which he received graciously and patiently, and we had some interesting discussions. As the drafter of the first two versions of ISO Schematron, there are passages I regard as clear and adequate that fresh readers may regard as excessively minimal: so it is great that Erik is also involved in the efforts to improve and progress the ISO Standard further.  He also has several videos introducing Schematron on YouTube.)