Using XPath to make Assertions is now a common technique

Posted on March 23, 2018 by Rick Jelliffe

The idea of using XPath in a schema language about structured data probably first came up with Dave Raggett’s Assertion Grammars. This was a recasting DTDs that allowed (I don’t know if this part was ever implemented) the context element to specified using an XPath: called Conditions. I think of this as LeftHandSide XPath, e.g.  XPath = CONTENT_MODEL

In Schematron, I took this up and used XPaths both for the LHS and RHS. You use XPath both to specify the context and the assertion tests.

The idea caught on.  For example, the popular Java static validator PMD allows you to make your own rules using XPath. In that case, it is just an absolute XPath that evaluates to boolean, so it has no equivalent of Schematrons rule, pattern, and phase grouping structures. (I would call this a Right Hand Side use of XPath, since we always start from the root of the document.)

The OASIS Content Assembly Mechanism is an XML language for feature extraction from documents, to allow integration with processing systems: is this document an urgent invoice. And much more.  My impression is that CAM uses Right Hand Side XPaths, as arguments to functions.

The XMLUnit unit test library for Java and .NET allows XPaths to be used in the LHS and as part of the RHS, as well as many other approaches.

Finally, XML Schemas 1.1 added XPath assertions to its grammars.  It definitely is Right Hand Side: the contexts are found using the grammar rules.

There are numerous other examples of langages Path to make assertions is now entirely mainstream and boring as a proven approach.  So why don’t the major platform vendors support it directly? (I suppose that if Java still does not have a simple one-line method to copy directory trees, our standard platforms are a lot less advanced than we think…)

What Schematron has, but all these other languages lack in general (and this is not a criticism, they have different use-cases) is 1) a primacy of human language assertions, 2) a way of efficiently grouping assertions to reduce the number of passes needed (in Schematron, you use rules, patterns, variables, keys for this), and 3) a sense of workflow or progressive processing (in Schematron, you use phases for this).

Off-topic.  Schematron allows you to specify which query language to use. So it would be entirely possible to make a version of Schematron that used JsonPath instead of XPath, for both the LHS and RHS, as far as I can see. I haven’t checked to see if anyone has done this, and I rather expect that JSON objects in reality are too fixed-structured to make it worthwhile (and that in any case people who use JSON are not “schematic” thinkers and are more likely to put validation into the classes that the JSON serializes to and from.)  But it would not surprise me to see something there.