Chapter 2. Getting Started

In order to start processing files using Schematron, you're going to need a few files on your system. You will find the XSLT files on Schematron.com. The ones you want are:

iso_svrl.xsl
iso_schematron_skeleton.xsl

The remaining files you can type in, adding to them them as needed. Our file needing validation starts off very simply. It has no DTD or Schema (we have Schematron!). It represents a book. Quite boring and very simple. Example 2.1, “File input.xml, the simplest input document ” shows this file. We can add complexity when we need it to show Schematron features. So let's look for the constraints we want to apply.

Example 2.1. File input.xml, the simplest input document



<?xml version="1.0" encoding="utf-8" ?>
<doc>
<chapter id="c1">
  <title>chapter title</title>  1
  <para>Chapter content</para>
</chapter>

<chapter id="c2">
<title>chapter 2 title</title>
<para>Content</para>            2
</chapter>

<chapter id="c3">
  <title>Title</title>
  <para>Chapter 3 content</para>
</chapter>
</doc>

1

A chapter has a title

2

A chapter has one or more paragraphs


Now for the constraints. What rules do we want to apply? As you may imagine, I'm going to pick some that may be odd, primarily to demonstrate the functionality of Schematron. I'll try and keep them reasonably sensible.

The first rule is to check that each chapter has a title. Before defining that rule in the Schematron file we need to know something of the outline Schematron file that will be used in all the examples.

Since this file is testing the file input.xml, I'm going to name it input.sch. Example 2.2, “File input.sch, an empty Schematron file.” shows this file. I'm using .sch as the filename extension simply as a reminder that it is a Schematron file.

Example 2.2. File input.sch, an empty Schematron file.

 
<?xml version="1.0" encoding="utf-8"?>
<iso:schema    
  xmlns="http://purl.oclc.org/dsdl/schematron"  1
  xmlns:iso="http://purl.oclc.org/dsdl/schematron" 
  xmlns:dp="http://www.dpawson.co.uk/ns#"
  queryBinding='xslt2'
  schemaVersion='ISO19757-3'>                  2
  <iso:title>Test ISO schematron file. Introduction mode</iso:title>
<iso:ns prefix='dp' uri='http://www.dpawson.co.uk/ns#'/> 3
<!-- Your constraints go here -->            4


</iso:schema>

2

The general heading for a Schematron file. Note the namespaces in use. Add them as normal for any XML file.

4

The required constraints go in the body of the file

1

The Schematron namespace.

3

Do you think Schematron knows all about your namespaces? No. For each one specific to you, that you need, add it here as a <iso:ns/> element


Not much to look at. The document element is in the schematron namespace. The Schematron namespace http://purl.oclc.org/dsdl/schematron is associated with the prefix iso, as it is in all these examples. Previous versions of Schematron used the sch prefix. You can choose what prefix you want. Just make sure which namespace you want to associate it with.

The queryBinding attribute specifies which version of XSLT we are going to use to process the rules. The title is used in the final output as … surprisingly, a document title! The only other content is a foreign namespace definition. I've included it here simply to show how it's done. We'll use it later. If your input document is namespaced, you'll need to add the namespace in two places, as a declaration in the document element, and as a ns element. That's it!

Now to add the constraints at the place marked in Example 2.2, “File input.sch, an empty Schematron file.”

Example 2.3. Check for a chapter title

 
 <iso:pattern>
    <iso:rule context="chapter">                                  1
      <iso:assert 
         test="title">A chapter should have a title</iso:assert>  2
    </iso:rule>
  </iso:pattern>

1

The context of the test is the chapter element

2

Test that each chapter has a title


Starting with the pattern element. This is basically a grouping wrapping. For example, we may choose to group all related chapter level checks within one pattern. Within a pattern element there is one rule element. There could be many rules within a single pattern. It is good practice to restrict the number of rules such that the group is coherent and can be quickly understood.

The rule element is at the heart of Schematron. This expresses a rule that you want to run against the input document. Two points to note here. Firstly the contextattribute. This may be viewed in the same way as the match attribute on the xsl:template element in an XSLT stylesheet. The key point is that this specifies the context (used in just the same way as a context is used in XSLT) in which the rules will be applied. So for this case, the rule will be applied where the context is the chapter element in our input.xml document. Again note that the rule element has just one child, an assert element, though as before, it may have many child elements, though the context will remain that specified by the context attribute.

A word of caution. Some rules are said to be abstract. This is defined to be the case when the abstract has a value of true. If a rule has a context attribute, then it cannot have an abstract value set to true. More on this later, see Chapter 9, The extends element. The grammar for the rule element is, using pseudo DTD syntax:


element rule
     Either
     attributes: abstract[true], id
     children:   Let*, (Assert | Report | extends)+

     or

     attributes: abstract[false]?, context, id?
     children:  Let*, (Assert | Report | extends)+


So a rule is either abstract or has a context. The latter use is the more common one.

Finally, the assert element. We need a clear understanding of this element, so please slow down a little reading this paragraph! Two aspects are key. Firstly the test attribute, which acts in just the same way as the test attribute on the xsl:when element in XSLT. It's a boolean test returning either true or false. It is executed within the context of the parent rule (the chapter element in this case). So if we look at the input document for which we are writing the rules, for each chapter element, we are making an assertion that the chapter element has a title element as a child. That can only be either true or false. A chapter has a title element as a direct child, or it doesn't. That's the syntax. Now the semantics.

This is an assert statement. See section ¶ 5.4.2 in 2. An assert statement is (sort of) negative. What I mean by that is that if the test passes, the assertion is said to succeed. The text content of the assert statement (A chapter should.....) is the message you want to be output if the assertion fails. What this means in fact is, if the test passes the asssociated message is not output. If the test fails, the message will be output! Now re-read this paragraph. I know it made my head hurt. This is why it matches our test for a title child element. If the title is there, no message is output. If the title is missing, then test fails and the message is output to the report file.

It becomes easier to accept when you see its inverse, the report element.

To recap. We want to check if each chapter element has a title. The context attribute on the rule element is set to chapter. The assert statement uses the test attribute to test that such a child element exists. If the test fails, then the text contained within the assert element is output in the report! That completes the description of the first element. A little tedious, but I hope worthwhile.

Before moving on to other elements, we should check that it all works in practice. If all the tests pass, this is really quite boring. It works on the principle that no news is good news, so that a test which passes does nothing? So the assert should not report anything since our input file is compliant to our single constraint!

Legal Notice