Fundamental Structural Patterns Bolognese

Posted on December 21, 2018 by Rick Jelliffe

Schematron has a construct called abstract pattern. This is a pattern where all the implementation details (such as the specific element names) can be provided as parameters of the abstract pattern. In Schematron currently, abstract patterns are just a macro, syntactic sugar which is does not pass through to the SVRL output, so it is for convenient expression and conceptual modelling rather than operational.

One of the reason for these, and indeed for the grouping element of “pattern” comes from my experience writing my book 20 years ago, The SGML & XML Cookbook. What I needed for that book was a way to express abstract patterns, which I could then present different concrete implementations. But DTDs (and the other schema languages of the day, such FrameMaker’s EDD) were not up to it. Even now, I think Schematron is the only mainstream schema language to take this as a primary focus.

So I am always interesting to see what kinds of patterns other people detect in documents. Dealing with Structural Patterns in Documents (Di Iorio, Peroni, Poggi, Vitali) from the excellent Professor Vitali’s group at University of Bologna is stimulating in this regard. It posits eight objective classes of elements (the permutations of whether it can have text nodes, whether it can have elements nodes, and whether it can have text siblings in its parent). It calls these:

Milestone
Meta
Atom
Field
Popup
Container, with non-exclusive subtypes

HeaderContainer
Table
Record

Inline
Block

Documents conform to this architect if each element only belongs to a single one of these objective classes. But if the same element can appear both in mixed and element content, say, that is a shift which means the analysis does not fit to that extent.

The Bolognese validate their theory against a corpus: data-oriented documents usually completely fit into this analysis, and the more freeform ones do mostly but with some shifting.