Chapter 3. Running Schematron

Table of Contents

Debugging Schematron

If you are used to running XSLT transforms this won't be much of a surprise for you. If it's new, and you're learning, I hope it's enough to get you started. For this sequence I'm using an XSLT 2.0 implementation from Mike Kay called Saxon. 3 may be downloaded from 4. Having installed Saxon, I have a couple of scripts to run Schematron. I'll show the Windows version first. So, Example 3.1, “Schematron Script, for windows” shows this script.

Example 3.1. Schematron Script, for windows

Note the use of \ as a continuation character. 
Such lines should be put all on one line


@echo off
cls
echo Usage: build %%1 = iso schematron file, no extension. 
echo  %%2 is the input xml file, with the extension.
echo E.g. build input input.xml will produce input.report.xml as output

del tmp.xsl
echo Generate the stylesheet from %1

java  -mx250m -ms250m  -cp .;\myjava\saxon8.jar;\myjava\xercesImpl.jar \ 1
  net.sf.saxon.Transform    -x org.apache.xerces.parsers.SAXParser -w1   \
  -o tmp.xsl    %1.sch iso_svrl.xsl

echo Now run the input file %2 against the generated stylesheet \
  tmp.xsl to produce %1%.report.xml

java  -mx250m -ms250m  -cp .;\myjava\saxon8.jar;\myjava\xercesImpl.jar \  2
  net.sf.saxon.Transform    -x org.apache.xerces.parsers.SAXParser -w1   \
  -o %1.report.xml    %2 tmp.xsl

type %1.report.xml

1

First stage - generate the stylesheet which runs the checks

2

Then run that stylesheet against the input document


Please note the comment about long lines. Where a line ends in a backslash, please remove it and join it to the following line.

The first point to note is that I have installed Saxon8.jar, the Saxon XSLT processor jar file, into a directory called myjava on the root of the current disk. If you installed it elsewhere, please change this. I'm assuming you're running java 1.5. If you're not, you're on your own!

For Linux, Example 3.2, “Schematron Script, for Linux” is suitable, with the same constraints concerning where your XSLT 2.0 engine is installed.

Example 3.2. Schematron Script, for Linux


clear
echo Usage: build $1 = iso schematron file, no extension. $2 is the input xml file, with extension.
echo E.g. build input input.xml will produce input.report.xml as output

if [ $# -ne 2 ]
   then
   echo "Usage: build <filename>.sch <filename.ext> to use filename.sch to validate filename.ext"
   exit 2
fi

if [ -f $1.sch ]
   then
     echo
   else
     echo Schema file $1 not found
     exit 2
fi

if [ -e $2 ]
   then
   echo
   else
     echo input file $2 not found
     exit 2
fi



if [ -e tmp.xsl  ]
  then
    rm -f tmp.xsl
fi

if [ -e $1.report.xml ]
   then
    rm $1.report.xml
fi
echo Validate the schema
cp=/myjava/jing.jar:/myjava/saxon652.jar:/myjava/xercesImpl.jar:/myjava/xml-apis.jar

java -classpath $cp com.thaiopensource.relaxng.util.Driver docs/isoSchematron.rng $1.sch

if [ $? -eq 0 ]
   then
     echo $1.sch is valid
   else
     echo Invalid Schematron file
     exit 2
fi


echo Generate the stylesheet from $1

java  -mx250m -ms250m  -cp .:/myjava:/myjava/saxon8.jar:/myjava/xercesImpl.jar \
       net.sf.saxon.Transform    -x org.apache.xerces.parsers.SAXParser -w1   \
       -o tmp.xsl    $1.sch /sgml/schematron/iso/iso_svrl.xsl  "generate-paths=yes"

# Add source document paths with the parameter "generate-paths=yes"





if [ $? -eq 0 ]
  then 
  echo run the input file $2 against the generated stylesheet $1.xsl to produce $1.report.xml

  java  -mx250m -ms250m  -cp .:/myjava:/myjava/saxon8.jar:/myjava/xercesImpl.jar \
    net.sf.saxon.Transform    -x org.apache.xerces.parsers.SAXParser -w1   -o $1.report.xml $2 tmp.xsl

  if [ -e $1.report.xml ]
   then
    #cat $1.report.xml
    java -classpath $cp com.thaiopensource.relaxng.util.Driver docs/svrlDP.rng $1.report.xml
    if [ $? -eq 0  ]
      then
      echo $1.report.xml is valid
    else
      echo $1.report.xml is invalid
    fi
  fi

fi
echo Done


Using input.sch and input.xml as the schematron file and input file (just as in the examples above), the first transform generates an XSLT stylesheet called tmp.xsl. The next transform uses this stylesheet, and the input file, input.xml to produce an output file called input.report.xml.

Note

Note that this does not include any include processoring, nor any abstract pattern processing. This requires a further two stages of processing prior to the above.

If you run it, you should see something like the following as the output, which is output to the console as the last action of the script.

Example 3.3. Schematron output


Warning: at xsl:stylesheet on line 89 of file:/C:/sgml/schematron/iso/iso_svrl.xsl:
  Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
run the input file input.xml against the generated stylesheet input1.report.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" xmlns:xs="http:/
/www.w3.org/2001/XMLSchema"
                        xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                        xmlns:sch="http://www.ascc.net/xml/schematron"
                        xmlns:iso="http://purl.oclc.org/dsdl/schematron"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        title="Test ISO schematron file. Introduction mode"
                        schemaVersion="">
   <svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
   <svrl:active-pattern/>
   <svrl:fired-rule context="chapter"/> 1
   <svrl:fired-rule context="chapter"/> 2
   <svrl:fired-rule context="chapter"/> 3
</svrl:schematron-output>" "


1 2 3

Indicates which rules have fired, indicated by the context


No, not very interesting is it! This is the reality of testing. The less output the better! the lines containing fired-rule simply indicate that the rules within the chapter context were fired (i.e. they ran) three times. Exactly what we'd expect, with three chapters in our input file! So I'd class that as a success. There is a little more about this language in Chapter 12, Schematron Validation Report Language (SVRL). If you are curious, see annex D of 2.

Just to see what happens, you could remove the title from one of the chapters in input.xml and re-run the script. I removed the title from the second chapter. The output changed to

Example 3.4. Schematron output


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" xmlns:xs="http:/
/www.w3.org/2001/XMLSchema"
                        xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                        xmlns:sch="http://www.ascc.net/xml/schematron"
                        xmlns:iso="http://purl.oclc.org/dsdl/schematron"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        title="Test ISO schematron file. Introduction mode"
                        schemaVersion="ISO19757-3">
   <svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
   <svrl:active-pattern/>
   <svrl:fired-rule context="chapter"/>
   <svrl:failed-assert test="title">                    1
      <svrl:text>Chapter should have  a title</svrl:text>
   </svrl:failed-assert>
   <svrl:fired-rule context="chapter"/>
</svrl:schematron-output>

1

The assert failed, the message is output. The two fired rules either side indicate that it is the second chapter that is at fault.


Which output tells us that for the second time that the rule fired, the assertion failed, hence the output message is seen. That's the output received, derived directly from the statement in the input schematron file. XSLT can process that into any format you might need.

Moving on from the assert statement.

The second element of prime interest in Schematron is essentially the inverse of the assert element, it's the report element as defined in ¶ 5.4.11 in 2. The syntax is just the same as the assert, the only change being the element used and the semantics. This can lead to confusion. The standard reads, if the test evaluates positive, the report succeeds. Which reads almost identically to the assert semantics! Yet if you play around with it, you'll find that an output message is seen under the inverse conditions of the assert. My view on this is that we should use a report when something is not as it should be. The logic here is that the test should make a positive statement. That way the report element seeks invalid content and reports it, the assert statement seeks errors and reports them. Yet again, think on that for a while and the difference should become clear.

Returning to the title element, we could generate a report each time we found a title in a chapter. In my view that's not very useful. What I'm going to suggest is a report element which counts the number of paragraphs within a chapter and reports that number. Feeble, but it shows two aspects of Schematron. Firstly the use of report, and secondly the abstraction of information for the report, from the source document. Example 3.5, “Using the report element” shows the updated schematron file. The input file has changed as shown below.

Example 3.5. Using the report element

<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema    xmlns="http://purl.oclc.org/dsdl/schematron" 
	       xmlns:iso="http://purl.oclc.org/dsdl/schematron" 
	       xmlns:sch="http://www.ascc.net/xml/schematron"
	       queryBinding='xslt2'
	       schemaVersion="ISO19757-3">
  <iso:title>Test ISO schematron file. Introduction mode</iso:title>
  <!-- Not used in first run -->
  <iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#" />

 <iso:pattern >
    <iso:rule context="chapter">
      <iso:assert test="title">Chapter should have  a title</iso:assert>
      <iso:report test="count(para)">                       1
      <iso:value-of select="count(para)"/> paragraphs</iso:report>  2
    </iso:rule>
  </iso:pattern>

</iso:schema>

  

1

The report element, indicating the number of para elements found in the chapter

2

The value-of element retrieves the actual count of paragraphs


That is the full file used.

The input file has changed insofar as a few more para elements have been added. It now looks like Example 3.6, “The updated input.xml file”. The value-of element in the Schematron namespace is used to obtain information from the source document.

Example 3.6. The updated input.xml file


<?xml version="1.0" encoding="utf-8" ?>
<doc>
<chapter id="c1">                       1
  <title>chapter title</title>
  <para>Chapter content</para>
</chapter>

<chapter id="c2">                       2
<title>chapter title</title>
<para>xx</para>
<para>yy</para>
<para>zz</para>
</chapter>

<chapter id="c3">                       3
  <para>Para in the wrong position</para>
  <title>chapter title</title>
  <para>xx</para>
  <para>yy</para>
  <para>zz</para>
  <para>aa</para>
</chapter>
</doc>

  

1 2 3

This valid file produces the 3 output reports from the 3 chapters


When this file is run, the output should be something like Example 3.7, “Resultant svrl output file”

Example 3.7. Resultant svrl output file


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" 
xmlns:xs="http://www.w3.org/2001/XMLSchema"
                        xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                        xmlns:sch="http://www.ascc.net/xml/schematron"
                        xmlns:iso="http://purl.oclc.org/dsdl/schematron"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        title="Test ISO schematron file. Introduction mode"
                        schemaVersion="ISO19757-3">
   <svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
   <svrl:active-pattern/>
   <svrl:fired-rule context="chapter"/>
   <svrl:successful-report test="count(para)">   1
      <svrl:text>1 paragraphs</svrl:text>
   </svrl:successful-report>
   <svrl:fired-rule context="chapter"/>
   <svrl:successful-report test="count(para)">  2
      <svrl:text>3 paragraphs</svrl:text>
   </svrl:successful-report>
   <svrl:fired-rule context="chapter"/>
   <svrl:successful-report test="count(para)">  3
      <svrl:text>4 paragraphs</svrl:text>
   </svrl:successful-report>
</svrl:schematron-output>

  

1 2 3

The 3 output statements resulting from the 3 successful-report elements


Note how it has suddenly become far more dense? The level of markup is starting to hide the actual information content, hence the need for a further tranform to format it the way you want. The only items of interest are the lines which output (for each chapter) the paragraph count. This tells us the number of paragraphs in the three chapters. Informative? Maybe, but I think the ideas are clear.

Before moving on to other aspects of Schematron, I want to diverge just a little into decorations. Sometimes they are useful, other times you may have no use for them at all. I find them useful for adding to the output such things as versioning information, the data and time processed etc. I'll show how and where they are added, then you can use them if you choose. Chapter 10, Decorating the output discusses this further. No change to the input file, but the Schematron file input.sch has a few additions. See Example 3.8, “A decorated Schematron file”

Example 3.8. A decorated Schematron file


<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema    xmlns="http://purl.oclc.org/dsdl/schematron" 
	       xmlns:iso="http://purl.oclc.org/dsdl/schematron" 
	       xmlns:sch="http://www.ascc.net/xml/schematron"
	       queryBinding='xslt2'
	       schemaVersion="ISO19757-3">
  <iso:title>Test ISO schematron file. Introduction mode </iso:title> 1
  <!-- Not used in first run -->
  <iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#" />

  <iso:pattern id="doc.checks">
   <iso:title>checking an XXX document</iso:title> 2
   <iso:rule context="doc">
    <iso:report test="chapter">Report date.
    <iso:value-of select="current-dateTime()"/></iso:report>   3
   </iso:rule>
  </iso:pattern>

  <iso:pattern id="chapter.checks">
    <iso:title>Basic Chapter checks</iso:title>            4
    <iso:p>All chapter level checks. </iso:p>              5
    <iso:rule context="chapter">
      <iso:assert test="title">Chapter should have  a title</iso:assert>
      <iso:report test="count(para)"><iso:value-of select="count(para)"/> paragraphs</iso:report>
      <iso:assert test="count(para) >= 1">A chapter must have one or more paragraphs</iso:assert>
      <iso:assert test="*[1][self::title]">Title must be first child of chapter</iso:assert>
      <iso:assert test="@id">All chapters must have an ID attribute</iso:assert>
    </iso:rule>
  </iso:pattern>
</iso:schema>

  

1 2 4

The title element as headings

3

The date and time are output as a record of the test time.

5

A p element is used to provided additional information


Going through this, notice the following:

  1. Another pattern has been added for document level checks

  2. The report in that section outputs the current date and time using XSLT functionality.

  3. In that same pattern a title has been added which produces output in the final report. (The first element of decoration)

  4. In the chapter.checks pattern, a title and p element has been added, which may be useful for your purposes.

  5. An assert statement has been added to check that the title element is the first child of a chapter element.

  6. An assert statement has been added to ensure that a chapter has one or more paragraph.

  7. An assert statement has been added to check that each chapter has an id attribute.

  8. The report statement, counting paragraphs have been left in.

Running this version produces output as shown in Example 3.9, “The output report with decorations”

Example 3.9. The output report with decorations


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" 
                        xmlns:xs="http://www.w3.org/2001/XMLSchema"
                        xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                        xmlns:sch="http://www.ascc.net/xml/schematron"
                        xmlns:iso="http://purl.oclc.org/dsdl/schematron"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        title="Test ISO schematron file. Introduction mode "
                        schemaVersion="ISO19757-3">
   <svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
   <svrl:active-pattern name="doc.checks" id="doc.checks"/>
   <svrl:fired-rule context="doc"/>
   <svrl:successful-report test="chapter">
      <svrl:text>Report date.2007-01-19T14:33:41.153Z</svrl:text>  1
   </svrl:successful-report>
   <svrl:active-pattern name="chapter.checks" id="chapter.checks">
      <svrl:text>All chapter level checks. </svrl:text>  2
   </svrl:active-pattern>
   <svrl:fired-rule context="chapter"/>
   <svrl:successful-report test="count(para)">
      <svrl:text>1 paragraphs</svrl:text>  3
   </svrl:successful-report>
   <svrl:fired-rule context="chapter"/>
   <svrl:successful-report test="count(para)">
      <svrl:text>3 paragraphs</svrl:text>  4
   </svrl:successful-report>
   <svrl:fired-rule context="chapter"/>
   <svrl:successful-report test="count(para)">
      <svrl:text>4 paragraphs</svrl:text>  5
   </svrl:successful-report>
   <svrl:failed-assert test="*[1][self::title]">
      <svrl:text>Title must be first child of chapter</svrl:text> 6
   </svrl:failed-assert>
</svrl:schematron-output>

  

1

The report heading

2

Chapter level checks, from the p element

3 4 5

The 3 report statements as before

6

The failed assert - the chapter title must follow immediately on the chapter element.


You should be able to see where each of the additions arises. The decorations as I've called them are all within the text elements. It's your choice if you use them.

That summarizes the basics of Schematron. The functionality has grown from this basis. You can achieve a great deal with these two, which formed the basis of the intial Schematron.

Debugging Schematron

There are some issues when debugging Schematron files. Since the first steps are simply to create an XSLT file, as long as the input XML is well formed it is not likely that many errors will be found. It is when the generated XSLT file is run against the input file that the XSLT engine will find the errors. The stylesheet causing the errors is generated, so the line numbers will not be relevant. You need to trace back from that stylesheet to the patterns that generated the XSLT to see where the real problem lies. Not really a problem, just be aware of it.

When using includes or abstract patterns, the errors could well be in the included files, and the errors won't relate to those files. Again, just use a little logic to trace the error back to source

Legal Notice