Correct and Robust: Schematron’s assert versus report

Posted on December 31, 2018 by Rick Jelliffe

Schematron patterns contain rules. A node in the document validated fires at most one rule per pattern, however it may fire a rule in more than one pattern. That node then provides the context which assertions are tested. Vitally, these are tied to natural language statements of the assertion, allowing specific custom statements of the requirements and the diagnostics to be generated. Schematron provides two varieties of assertions:

assert statements which fail if the XPath test evaluates to false (or results that XPath treats as false, such as empty) generating an SVRL faled-assert element, and
report statements which successed if the XPath test evaluates to true, generating an SVRL successful-report element.

So when you should you use one or the other?

Assert is used when you are stating what should be found as part of the pattern. The assertion text might be of the form “An X should have Y because ABC”. Is the document, or set of documents, or input/output what you expect.

Report is used when you find something interesting as part of the pattern. The assertion text might be of the form “This X has a Y, which is intereresting because ABC”. So you use this for feature extraction, and to report things that are not next expected in the normal run. Often report is used for feature extraction, but are other natural uses.

This distinction is what Prof Sophia Drossopoulou calls (in another context) the difference between Correctness and Robustness. The difference is that Correctness relates to the output or behaviour being correct when things go right, Robustness checks refer to behaviour or output when something goes wrong.

Lets take an example: say we have some transformation that converts DOCBOOK into ODF. We might write a Schematron schema to test invariants in the input and the output: so we might say as our Correctness check:

<sch:rule context="/">
     <sch:assert test='count($docbook-input//chapter)
           = count($odf-output//text:section[@text:style-name="Chapter"])' role="ERROR" >The number of sections should be preserved, 
      as an essential requirement</sch:assert>
...

but we might also have as our robustness check ;

<sch:rule context="/">
     <sch:report test='count($docbook-input//chapter) < 100' role="WARN" >A document has been found with over 100 chapters: 
     this is not a credible number which our processing systems 
     is required or tested to support.</sch:report>
...

If you adopt this distinction, it might lead you to use report in some situations where you might normally use assert. For example, say you have a chain of transformations of your XML documents. Lets say that each of these processing stages requires the presence of an particular document identifier in the document. You might then say that testing the presence of this identifier might be a Correctness test for the first input and final output (i.e. a blackbox test) and therefore use assert, but that is a Robustness test inside the pipeline (because if it is missing it is a sign that a previous stage in the same black box has failed) (I.e. a whitebox test) and therefore should use report.

Of course, none of this is set in concrete: you choose pragmatically.