Schematron reimagined for JSON/JSONPath

Posted on November 7, 2018 by Rick Jelliffe

On GitHub you can find jsontron which is Schematron moved out of the XML/XSLT/XPath ecosystem and applied to the JSON/JavaScript/JSONPath ecosystem. What is particularly pleasing to me is that this seems to be a really full implementation of ISO Schematron, including phases (not abstract rules and abstract patterns, no biggie.)

It is written in JavaScript, takes a schema that is the JSON equivalent of a Schematron XML schema, and produces a JSON version of SVRL as output. It looks like something well worth the while for people who need it.

Amir Ali, who wrote it at Pace University as part of his studies, makes the point that JSON/JavaScript ecosystem systems need the OVAL (Open Vulnerability and Assement Language) validation regime as much as XML ecosystems do (perhaps more!). So a Schematron reimagined for JSON with no whiff of XML/XPath might be be sweeter for JSON/JavaScript developers.

Of course, not being XML, the schemas are not standard. But Amir Ali seems to have been very faithful to the structures and names of standard Schematron, so I guess it could be converted to and from XML pretty trivially (but what would be the point?) Schematron’s effectiveness comes from its implementability: if you have XML data then you have your XML parser and (probably) your XSLT engine, so all you need to do is compile the schema without much fuss and then reprocess the SVRL XML output with the same tools; it looks like the same benefit applies to jsontron: if you have JSON data then you have your JavaScript engine and you can run the validation with the same tools and see the output with the same tools.

Here is what a schema looks like:

{
" schema ":{
    "id":"Loan Data Rules",
    "title":"Schematron Semantic Validation",
    "schemaVersion":"ISO Schematron 2016",
    "queryBinding":"jsonpath",
    "defaultPhase":"phaseid1",
    " phase ”: [
        {
        "id":"phaseid1",
        "active":["patternid1"]
        }],
    " pattern ”: [
    {
        "id":"patternid1",
        "title":"Interest Rate Pattern",
        "abstract":false,
        " rule ”: [
       {
            "id":"RateRule1",
            "abstract":false,
            " context ": "$.loan_data.loans.*",
            " assert ":[
            {
                "id":"assertidINT21",
                "test": "(jp.query(contextNode,'$..interest_rate') >= 3.75",
                "message": "Assert 1: Interest Rate cannot be less than 3.75%"
            }
]}]}]}}

For details, the best place is Ali’s submitted thesis: Schematron Based Semantic Constraints Specification Framework and Validation Rules Engine for JSON, (D.P.I.S. Thesis), School of Computer Science and Information Systems, Pace University, October 2018. (Oops, it has not been published there yet, and WordPress is not letting me host it here due to file size.)

One interesting aspect (to me) of the thesis is the use of “Semantic Constraints”: Ali uses it to distinguish from regular syntactic constraints: in effect, if you have any constraint that requires more than one term to satisfy, he calls that “semantic”. So (using XPath) number(frog/@legs) is a syntax constraint while number(frog/@legs * frog/@multiplier) is deemed a semantic constraint. This identification of semantic constraint with non-regular constraint seems useful, though inexact. But I think it is particularly appropriate for a JSON library, because the JSON mindset is, I think, pretty much anti-schemas, in that they know they need to avoid that heavyweight kind of XML that is not suitable for their data: they don’t need a big fat schema to tell them a field is a number for example. So bringing attention to there being a kind of validation that is not what DTD/XSD provided, is smart marketing.

(Aside: Of course, with computers you only have symbols and patterns, so to whatever extent a computer knows about semantics it is just semantics as symbols and patterns in higher abstraction. One of the weaknesses of Schematron is that it provides no way for one assertion to make use of the information gained by other assertions: each assertion is independent of every other assertion: to do anything to tie them together is the responsibility of a higher layer (i.e. using the SVRL or SVRL-in-JSON). So architecturally Schematron can be classified as a feature-extracting expert system. I have often wondered about whether it would be useful to add in some higher-level constructs to Schematron (some kind of Prolog-style unification, for example) but it seems to me that this is better dealt with as a process working on the SVRL data, so you don’t compromise the efficiency of the Schematron validation/feature-extraction…)