A three-layer model for XML with Schematron

This article originally appeared in a blog on O'Reilly on March 23, 2010

The Analytical, the Practical and the Pragmatic

Recently on a trip I talked to some very interesting development people, who were quite worried about a large XML implementation they were in the middle of. They were surprised that it is possible to have XML without a XSD schema; I was more surprised.

And they were worried that they had to do their data modeling using XSD schema, when they thought it might be better to use err a modeling language; after a certain project size, I would agree with them.

But it brought home to me that if might be useful for some of the other ways of looking at the world through XML glasses to be better known. In particular, where does Schematron fit in?

So I made up this little diagram, which corresponds more to how things are panning out on some projects I have been involved in:

  • First, in the analytical layer we create a glossary which lists and defines all the objects the system has. (We may use a UML diagram for this, for connected to Use Cases through a Traceability diagram.) Then the business requirements lists and defines all the relationships between the different objects.
  • Next, we have the practical layer, where we have XML instances that implement the objects, and a Schematron schema that implements the business requirements rules.
  • Finally, we have, if needed, the pragmatic layer. This takes care of any stray issues that relate to how an XML document is transmitted or stored or displayed. For example, we might want to store some of the data in a DBMS, so we would like constrain the field lengths of certain information items. These lengths have nothing to do with any specific business requirement, and might only impact some systems. They are merely constraints necessary to fit in with some particular extrinsic technology: XML file serialization, DOM object creation, realational data mapping, and so on.The grammar-based schema languages such as RELAX NG and XSD fit in here.

The big difference in this model and conventional ways of thinking about schemas, is the role of the non-Schematron schema languages: they are limited to providing limited pragmatic information, certainly not being used as data models. I often see that people want to use the schema in an analytical position: it does not have to be there, and may not be a good fit there.