Could Schematron be used for Content Completion in editors?

Posted on May 12, 2017 by Rick Jelliffe

Over at XML.COM, Gerrit Imsieke has a stimulating article Epischemas: Schema Constraints that facilitate Content Completion. He wants to improve content completion in XML editors (where the editor automatically fills in the next step) given that many interesting types of documents have additional constraints to those available in a simple schema.

He takes the old idea of using two levels of RELAX NG schemas (James Clark suggested this method for validating with SGML exclusion exceptions, such as constraining an HTML A element to not appear under the child of another HTML A element): these are a little like Bloom filters—you can avoid a set of complex explosions of constraints by merely overlaying two broader, simpler sets.

The idea of validating with multiple operlapping “schemas” concurrently is of course in the DNA of Schematron: each pattern element is an overlapping “schema”.

Gerrit Imsieke’s method is that when each grammar suggests the element or whatever that can be filled in at a certain point in editing, he takes their union and provides that to the content completion mechanism of the editor.

The foundation of the article is that Schematron does not, and indeed cannot, do content completion. “This is because it is impossible for a content-completing editor to know in advance the finite list of possible completions, even if a Schematron rule permitted only a finite set.” So Schematron is limited to giving custom suggestions at each point, or if you are using the Schematron Quick Fix it can insert some kinds of corrections.

I have no problem with a comment that Schematron does not do this at the moment, meaning Schematron implementations. But I am not at all certain that it cannot.

The issue is in particular about the pre-compilation of rules into finite lists of completions that are suited for content completion.

I see three problems:

First, because RELAX NG is a different class of grammar to DTDs, in fact you may need to parse the document from the beginning in order to find which elements or attributes or enumerations are available at a point. So I don’t see that RELAX NG always allows the creation of simple completion lists: only some RELAX NG schemas.
Second, you ca write Schematron schemas that allow simple content completions, cheatingly, because you can in fact put grammars as Schematron constraints if you are using XSLT2 or EXLST Query Language Bindings. I give the method in Can Schematron use Grammars to test Assertions? So you can feed your content-completion list creator with a such a grammar, just as you can with grammar-based schema languages.
Third, we can write constraints and mark them as suitable for feeding content completion mechanisms.

Rules a Content Completion system could extract Following Lists from

For this last one, lets take the following rule (expressed as a DTD)

<!ELEMENT section ( number?, title, p+) >

and a Schematron schema that implements it:

<pattern > <rule context="section"> <assert test="count(num) + count(title) + count(p) = count(*)" >Only num, title and p elements are allowed</assert> <assert test="*[1][self::title or self::num]" >The first element should be a title or a number</assert> </rule>

<rule context="section/number"> <assert test="position() = 1" >A number must be in the first position only</assert> <assert test="following-sibling::title" >A title must follow a number</assert> </rule>

<rule context="section/title"> <assert test="following-sibling::p" >A p must follow a title</assert> </rule>

<rule context="section/p"> <assert test="following-sibling::p or not(following-sibling::*)" >You can have multiple p elements at the end of the section</assert> </rule> </pattern>

Now each of these assertions, apart from the first one (which is strictly unnecessary in this case) have the information needed for some content completion mechanism: they give a simple context, like a DTD, and they let you know what the first element should be, and what the next element should be.

So we could write a processor that says “If a pattern contains only rules with simple parent/child contexts then any assertion follows the model following-sibling::X patterns can be used for content completion”. And so on for the other kinds of rules.

We could handle exclusion exceptions in the same way. The tendency would be to make an extra assertion on the section/p rule or on the section rule. But lets put it as a guard on the section/title rule:

<rule context="section/title"> <assert test="following-sibling::p[not(ancestor::p)]" >A p must follow a title</assert> </rule>

Choice groups, stars and pluses can be handled in the same way.

Abstract Rules

If we wanted to make things a bit more explicit, we could use abstract rules. (In ISO Schematron 2016, the extend element with an href means to include the file, using the contents of the top-level element. This file would have a library of Schematron abstract rules, which implement the patterns above. Abstract rules let you parameterize assertions, a macro facility. Detection is simpler, and the schema creators have a small vocabulary to use only.)

<pattern role="code-completion" > <extends href="completion-friendly-abstract-rules.sch"/>

<rule context="section"> <extends rule="required-first-children"> <param name="elements" value="self::title or self::num"/> </extends> </rule>

<rule context="section/number"> <extends name="required-following-element"> <param name="title"/> <param name="can-end" value="false()" /> <param name="recursive" value="false()"/> </extends> </rule>

<rule context="section/title"> <extends name="required-following-element"> <param name="element" value="p"/> <param name="can-end" value="false()" /> <param name="recursive" value="false()"/> </extends> </rule>

<rule context="section/p"> <extends name="allowed-following-element"> <param name="element" value="p"/> <param name="can-end" value="true()" /> <param name="unless-ancestor" value="p"/> <param name="recursive" value="false()"/> </extends> </rule> </pattern>

A minimal catalog of abstract rules allowed-following-element, required-following-element, allowed-following-elements, required-following-elements, allowed-first-child, allowed-first-children, required-first-child, empty would be enough to handle most parts of most simple content models. (In particular, element content model that are regular expressions and in which repeated names only appear in the same context: so ( a, (b |c), d, (b|c), d, e) but not (a (b|c) d b). )

There is a parameter to handle the exclusion exception, that we don’t allow recursive p elements. The abstract patterns allow us to create almost a DSL for content completion, by hiding patterns and assertions behind parameters.

So I think this shows that you could have a system where a Schematron schema could be compiled into simple completion lists for a content completion system.