Schematron is a language for making assertions about whole XML documents, that each part of various patterns must be present. Schematon (and the other XML schema languages starting with DTDs) are a little special in that they are geared for assertions to be made on large complex data structure, but the idea of assertions is far from new (and far from strange) going right back to the days of relays and tubes and before there were data structures.
In 1950 Alan Turing wrote a small paper called Checking a large routine (which is available from the Turing Archive.) Of course, the currency of mathematics is what are called assertions. He said the coder should add assertions that will assist the checker: in his day, checking probably involved looking at the actual machine hardware (which I presume is the value of the Williams tube accumlator) and tracing through the various steps. Turing first allocates an artificial name to each variable at significant parts of the computation, and then writes a suitable assertion for these inductive variables.
In 2002, C.A.R. Hoare wrote a nice memo Assertions in Programming: from Scientific Theory to Engineering Practise. He writes:
“An assertion is:
- a Boolean expression
- written in the middle of a program
- which is always true
- whenever control reaches that point.
- At least, that’s the intention.
Turing’s assertions were out-of-band to the program: a separate file or perhaps pencilled annotations. Hoare’s assertions are kinds of program statements inside the program.
Assertions in the middle of a routine
Hoare’s definition here distinguishes it from other approaches, such as Bertrand Meyer’s Design by Contract, which have a different scenario: you are primarily interested in establishing what must be true at program boundaries: the preconditions, post-conditions and invariants in effect subtype the parameters and return values of methods or services. SGML’s notion of validation, where document processing will fail if the incoming document is not valid against the DTD, fits in with the Design by Contract idea. And Schematron is frequently used there too.
But looking at both Turings and Hoare’s formulation, there is a different aspect: they are just as interested in intermediate values in the middle of a calculation as they are in initial and terminal conditions.
Schematron provides a facility specifically to help this: it is called phases. When you make your assertions (which are grouped into patterns for efficient evaluation and structure) you can allocate them to particular phases. This then lets you partially validate a partial document: an XML document that is mid-way through processing, for example. The XML document captured at a particular point is your inductive variable, and the Schematron phase is the things you wish to assert about it. No other schema language provides this kind of support for subsetting constraints.
The phases mechanism in Schematron also allows you to only check some small piece of the XML tree, if you like.
Hoare’s memo, which discusses some of his Microsoft collegues’ work on assertion at that time, also includes another kind of assertion, which he calls an “assumption“. This is code to check a simpifying assumption that a developer has made about the code; by the time the code is finalized, if the assumption never fails, it was not needed and can be removed.
Schematron provides a construct called report that can be used for this: it is merely an up-side-down assertion, and it tells you if it finds something. It lets you do things like feature extraction on your document, to report on whole patterns you have found rather than patterns that are incomplete.
So I think Schematron’s use of “assertion” fits in with Alan Turing’s. Nice.
Compile Time Assumptions
But there is also another use of the term in programming languages. You can see it slightly in the Go programing languages Type Assertions: such as s := i.(string) which does two things:
- a compile-time operation, where s gets allocated the type string;
- a run-time operation, where object i is that it is a string object (and a panic generated if not).
So the second is what we think of as a classical computing assertion. But the first is a different kind of beast. Indeed, the relative weakness of grammar-based languages for run-time validation can be offset because grammars are passably suited for some kinds of compile-time operations, such as data-binding. ( Hoare mentions using assertions to enhance the information known by a compiler when discussing macro PREFIX_ASSUME.)