Schematron for Workflow

Posted on June 8, 2022 by Rick Jelliffe

I just saw a very interesting use of Schematron, as the configuration language to configure a workflow and repair system on the PageSeeder editorial/CMS/publishing platform. I have followed PageSeeder over the years, as friends are involved with it, but I have not really brought up its Schematron capabilities, which are well integrated.

Basically, when you validate a document, the system looks at the SVRL output and populates an error pane with the assertion messages and diagnostics, with an indication of the flag or role. Click on an issue, and it pops up a dialog box that gives you a choice of actions to deal with that issue.

What is cool is that this list is determined by information in the SVRL carried over from sch:property elements in the Schematron schema. This gives a declarative way to clasiify assertion failures for the major choices (of this system):

to run some XSLT on the document (e.g., to fix some issue: the XSLT gets a parameter which is an XPath with the location of the element to be fixed); this is a quite similar use to SQF, but simpler, and it is easily applied to bulk documents;
to change the status property of the document (e.g. from In Progress to Ready to Check or whatever); this status property can be used to change the visibility of the document to different groups, which is one kind of workflow;
to put the document onto some task lists, so that it is queued for people with the corresponding role to attend to. (This queueing needs to be first OKed by the general project expeditor, who needs to be on top of such things.)

It was implemented for a particularly high-pressure use: several departments of a very large organization need to produce a massive ULTRA-high-value document (think billions of dollars) each year with about a month preparation time and many last-minute changes. Each section of the document is initially authored by one department then passed around other departments who make revisions. If some revision is found to not fit the guidelines it needs to be fixed and checked off by all the departments again.

The approach they take is something like this for their Schematron schema:

...
<sch:assert test="count(heading) != 1"
   properties="major-editorial fix-multiple-headers"
>There should be one and only one heading here</sch:assert>
...
<sch:property name="major-editorial" role="task">
    <description>Send back for editorial fix</description>
    <status>error</status>
    <group>editors</group>
</sch:property>
<sch:property name="fix-multiple-headers"  role="fix">
     <description>Fix multiple headers</description>
</sch:property>

So in some context, the document is required to have a heading element, but only one. If this assertion fails, then the SVRL will have an svrl:failed-assert element with property elements carried over. (These can have any elements you like under them, whatever is needed to drive the particular system.)

The system shows a list with the message “There should be one and only one heading here”; if you click on the appropriate button up comes a dialog box with a list of choices whose text comes from the description element: click on one, and the system dispatches all the information in the failed-assert and the properties to which ever handler is in place, here “fix” (to run the XSLT) or "task" (to put it on a task list).

In the case of this system, they can do all the actions manually using the GUI: they are not trapped by the workflow, at a pinch. But the Schematron validation is the usual way to progress.

I was really pleased to see the use of sch:assert/sch:properties here. It makes it much easier to integrate the results of validation with subsequent automated or UI processes.

When you think about “validation” in the context of a pipeline or collaborative workflow, just having a simple "yes/no" result or static error message creates complexity, because you then need to write code that tries to interpret that result in some actionable way. And then it stops being a declarative or scripting issue and becomes a programming issue, with different SDLC dynamics: often this is too hard, and so a human is put in place to look at log messages and triage.

So Schematron’s properties allow you to attach arbitrary information to validation results, including dynamically generated context-specific values based on the document being validated. The subsequent processes don’t need to open up the document and scrabble around it to get the information they need to progress the document. But, because it done by properties, the pattern/rule/assert elements continue to act as schemas: we are decorating or annotating the schema, not introducing programming code.