“Step Out of Time” with Schematron: getting the Plot not just the Story

Posted on January 23, 2019 by Rick Jelliffe

Schematron was developed out of an attempt to imagine a non-toy schema language for people who thought different.

In particular, after the wonderful Professor C. C. Hsieh explained to me that the Chomsky-style grammar, which DTDs were based on, was amenable to Westerners because it matched our simply-tokenized languages and clear ideas of words, but was simply not the way Chinese thought about Chinese languages: “most Chinese could not recall the Chinese word for ‘word’” he said, they think first of character 字 which is not word per se (indeed, IIRC term is a better analog for the Chinese 詞條). I was working on ideas for schema languages at Academia Sinica Taiwan: the advent of XPath and Dave Ragget’s demonstration with his Assertion Grammars extending grammar particles with XPath-style location steps on the LHS of the Grammar demonstrated assertions and prompted the thought, why not just XPath? (Schema language design fans may enjoy Dave’s idea for extending RELAX NG XERT too.)

More recently I have been thinking whether there has been another dynamic of “thinking different” that may explain Schematron’s enduring success in critical applications and perhaps its lack of mass acceptance. I think (or hope) Schematron appeals especially to people with what Professor Simon Baron-Cohen calls Systematizing Minds, which I recognize in myself. The Guardian had a stimulating quote from him on this:

Such people, he remarks are possessed of “a mind constantly striving to step out of time, to set aside the temporal dimension in order to see… the eternal repeating patterns in nature”.

His quote is interesting because I spend a lot of time thinking about the human dimension to document languages, and ways to characterize Schematron versus other schema languages.

I rather thought Schematron would have been superseded by now, using the lessons learned: instead its ideas have been various tacked on used to enrich XSD, or recast into a unit-testing single test approach with OASIS CAM, or escaped into other applications like Selenium and PDM. But none of these take the essential point of Schematron: how to have a schema language for humans and especially for humans who think different. So I think we need to draw inspiration from outside computers, just as Charles Goldfarb’s inspiration of using language theory for documents did, at the genesis of what became SGML and XML.

I want to pull out Prof Baron-Cohen’s distinction between time-base thinking (where things are modeled as a linear sequence of events) and systemizing. Grammars model documents in a linear fashion, imposing a kind of time on the elements: first this then next that.

A DTD models an element’s contents as an ordered sequence of elements, A then B then C. In order to know what child elements are valid at a point, you may need to go back to the start of the element and trace through..
An XSD schema models an element’s contents as an ordered sequence of elements, A then B then C. In order to know what child elements are valid at a point, you may need to go back to the start of the element and up the ancestor tree and trace through.
A RELAX NG grammar models an element’s contents as an ordered sequence of elements, A then B then C. In order to know what child elements are valid at a point, you may need to go back to the start of the document and trace through.

Schematron actually makes it difficult to represent long linear chains which are all that grammars can consider. Each location step and predicate can be along any axis.

Of course, all actual grammars-based schema languages have have addition that violate the strict sequence: DTDs have IDs, XSD has keys, XSD 1.1 adds guards and assertions, RELAX NG treats attributes as if they had sequence.

I don’t want to completely deny the reverse possibility: that Schematron may actually be a tool that helps linear-thinkers escape their railway-train view of reality, and similarly that grammars help the time-blind manipulate sequence better. It is a lovely thought.

After making this analogy of element sequence as a virtual timeline, we can borrow a literary theory E.M. Forster’s definition (itself owing to Aristotle) that story is the chronological sequence of events while plot is the causal and logical connection between events. Can we therefore say that conventional schema languages are all and only about story while Schematron is all and only about plot?