Table of Contents
If you are used to running XSLT transforms this won't be much of a surprise for you.
If it's new, and you're learning, I hope it's enough to get you started. For this
sequence I'm using an XSLT 2.0 implementation from Mike Kay called Saxon.
3 may be downloaded from
4. Having installed Saxon, I have a couple of scripts to run Schematron. I'll show
the Windows version first. So,
Example 3.1, “Schematron Script, for windows” shows this script.
Example 3.1. Schematron Script, for windows
Note the use of \ as a continuation character.
Such lines should be put all on one line
@echo off
cls
echo Usage: build %%1 = iso schematron file, no extension.
echo %%2 is the input xml file, with the extension.
echo E.g. build input input.xml will produce input.report.xml as output
del tmp.xsl
echo Generate the stylesheet from %1
java -mx250m -ms250m -cp .;\myjava\saxon8.jar;\myjava\xercesImpl.jar \
net.sf.saxon.Transform -x org.apache.xerces.parsers.SAXParser -w1 \
-o tmp.xsl %1.sch iso_svrl.xsl
echo Now run the input file %2 against the generated stylesheet \
tmp.xsl to produce %1%.report.xml
java -mx250m -ms250m -cp .;\myjava\saxon8.jar;\myjava\xercesImpl.jar \
net.sf.saxon.Transform -x org.apache.xerces.parsers.SAXParser -w1 \
-o %1.report.xml %2 tmp.xsl
type %1.report.xml
Please note the comment about long lines. Where a line ends in a backslash, please
remove it and join it to the following line.
The first point to note is that I have installed Saxon8.jar, the Saxon XSLT processor jar file, into a directory called myjava on the root of
the current disk. If you installed it elsewhere, please change this. I'm assuming
you're running java 1.5. If you're not, you're on your own!
For Linux,
Example 3.2, “Schematron Script, for Linux” is suitable, with the same constraints concerning where your XSLT 2.0 engine is installed.
Example 3.2. Schematron Script, for Linux
clear
echo Usage: build $1 = iso schematron file, no extension. $2 is the input xml file, with extension.
echo E.g. build input input.xml will produce input.report.xml as output
if [ $# -ne 2 ]
then
echo "Usage: build <filename>.sch <filename.ext> to use filename.sch to validate filename.ext"
exit 2
fi
if [ -f $1.sch ]
then
echo
else
echo Schema file $1 not found
exit 2
fi
if [ -e $2 ]
then
echo
else
echo input file $2 not found
exit 2
fi
if [ -e tmp.xsl ]
then
rm -f tmp.xsl
fi
if [ -e $1.report.xml ]
then
rm $1.report.xml
fi
echo Validate the schema
cp=/myjava/jing.jar:/myjava/saxon652.jar:/myjava/xercesImpl.jar:/myjava/xml-apis.jar
java -classpath $cp com.thaiopensource.relaxng.util.Driver docs/isoSchematron.rng $1.sch
if [ $? -eq 0 ]
then
echo $1.sch is valid
else
echo Invalid Schematron file
exit 2
fi
echo Generate the stylesheet from $1
java -mx250m -ms250m -cp .:/myjava:/myjava/saxon8.jar:/myjava/xercesImpl.jar \
net.sf.saxon.Transform -x org.apache.xerces.parsers.SAXParser -w1 \
-o tmp.xsl $1.sch /sgml/schematron/iso/iso_svrl.xsl "generate-paths=yes"
# Add source document paths with the parameter "generate-paths=yes"
if [ $? -eq 0 ]
then
echo run the input file $2 against the generated stylesheet $1.xsl to produce $1.report.xml
java -mx250m -ms250m -cp .:/myjava:/myjava/saxon8.jar:/myjava/xercesImpl.jar \
net.sf.saxon.Transform -x org.apache.xerces.parsers.SAXParser -w1 -o $1.report.xml $2 tmp.xsl
if [ -e $1.report.xml ]
then
#cat $1.report.xml
java -classpath $cp com.thaiopensource.relaxng.util.Driver docs/svrlDP.rng $1.report.xml
if [ $? -eq 0 ]
then
echo $1.report.xml is valid
else
echo $1.report.xml is invalid
fi
fi
fi
echo Done
Using input.sch and input.xml as the schematron file and input file (just as in the examples above), the first
transform generates an XSLT stylesheet called tmp.xsl. The next transform uses this stylesheet, and the input file, input.xml to produce an output file called input.report.xml.
Note
Note that this does not include any include processoring, nor any abstract pattern
processing. This requires a further two stages of processing prior to the above.
If you run it, you should see something like the following as the output, which is
output to the console as the last action of the script.
Example 3.3. Schematron output
Warning: at xsl:stylesheet on line 89 of file:/C:/sgml/schematron/iso/iso_svrl.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
run the input file input.xml against the generated stylesheet input1.report.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" xmlns:xs="http:/
/www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode"
schemaVersion="">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern/>
<svrl:fired-rule context="chapter"/>
<svrl:fired-rule context="chapter"/>
<svrl:fired-rule context="chapter"/>
</svrl:schematron-output>" "
No, not very interesting is it! This is the reality of testing. The less output the
better! the lines containing
fired-rule simply indicate that the rules within the chapter context were fired (i.e. they ran)
three times. Exactly what we'd expect, with three chapters in our input file! So I'd
class that as a success. There is a little more about this language in
Chapter 12, Schematron Validation Report Language (SVRL)Chapter 12, Schematron Validation Report Language (SVRL). If you are curious, see annex D of
2.
Just to see what happens, you could remove the title from one of the chapters in input.xml and re-run the script. I removed the title from the second chapter. The output changed
to
Example 3.4. Schematron output
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#" xmlns:xs="http:/
/www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode"
schemaVersion="ISO19757-3">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern/>
<svrl:fired-rule context="chapter"/>
<svrl:failed-assert test="title">
<svrl:text>Chapter should have a title</svrl:text>
</svrl:failed-assert>
<svrl:fired-rule context="chapter"/>
</svrl:schematron-output>
Which output tells us that for the second time that the rule fired, the assertion
failed, hence the output message is seen. That's the output received, derived directly
from the statement in the input schematron file. XSLT can process that into any format
you might need.
Moving on from the assert statement.
The second element of prime interest in Schematron is essentially the inverse of the
assert element, it's the
report element as defined in ¶ 5.4.11 in
2. The syntax is just the same as the assert, the only change being the element used
and the semantics. This can lead to confusion. The standard reads, “if the test evaluates
positive, the report succeeds”. Which reads almost identically to the assert semantics!
Yet if you play around with it, you'll find that an output message is seen under the
inverse conditions of the assert. My view on this is that we should use a report when
something is not as it should be. The logic here is that the
test should make a positive statement. That way the
report element seeks invalid content and reports it, the
assert statement seeks errors and reports them. Yet again, think on that for a while and
the difference should become clear.
Returning to the title element, we
could generate a report each time we found a title in a chapter. In my view that's not
very useful. What I'm going to suggest is a
report element which counts the number of paragraphs within a chapter and reports that number.
Feeble, but it shows two aspects of Schematron. Firstly the use of
report, and secondly the abstraction of information for the report, from the source document.
Example 3.5, “Using thereport element” shows the updated schematron file. The input file has changed as shown below.
Example 3.5. Using the report element
<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
queryBinding='xslt2'
schemaVersion="ISO19757-3">
<iso:title>Test ISO schematron file. Introduction mode</iso:title>
<!-- Not used in first run -->
<iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#" />
<iso:pattern >
<iso:rule context="chapter">
<iso:assert test="title">Chapter should have a title</iso:assert>
<iso:report test="count(para)">
<iso:value-of select="count(para)"/> paragraphs</iso:report>
</iso:rule>
</iso:pattern>
</iso:schema>
That is the full file used.
The input file has changed insofar as a few more
para elements have been added. It now looks like
Example 3.6, “The updatedinput.xml file”. The
value-of element in the Schematron namespace is used to obtain information from the source
document.
Example 3.6. The updated input.xml file
<?xml version="1.0" encoding="utf-8" ?>
<doc>
<chapter id="c1">
<title>chapter title</title>
<para>Chapter content</para>
</chapter>
<chapter id="c2">
<title>chapter title</title>
<para>xx</para>
<para>yy</para>
<para>zz</para>
</chapter>
<chapter id="c3">
<para>Para in the wrong position</para>
<title>chapter title</title>
<para>xx</para>
<para>yy</para>
<para>zz</para>
<para>aa</para>
</chapter>
</doc>
When this file is run, the output should be something like
Example 3.7, “Resultant svrl output file”
Example 3.7. Resultant svrl output file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode"
schemaVersion="ISO19757-3">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern/>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>1 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>3 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>4 paragraphs</svrl:text>
</svrl:successful-report>
</svrl:schematron-output>
Note how it has suddenly become far more dense? The level of markup is starting to
hide the actual information content, hence the need for a further tranform to format
it the way you want. The only items of interest are the lines which output (for each
chapter) the paragraph count. This tells us the number of paragraphs in the three
chapters. Informative? Maybe, but I think the ideas are clear.
Before moving on to other aspects of Schematron, I want to diverge just a little into
decorations. Sometimes they are useful, other times you may have no use for them at
all. I find them useful for adding to the output such things as versioning information,
the data and time processed etc. I'll show how and where they are added, then you
can use them if you choose.
Chapter 10, Decorating the outputChapter 10, Decorating the output discusses this further. No change to the input file, but the Schematron file
input.sch has a few additions. See
Example 3.8, “A decorated Schematron file”
Example 3.8. A decorated Schematron file
<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
queryBinding='xslt2'
schemaVersion="ISO19757-3">
<iso:title>Test ISO schematron file. Introduction mode </iso:title>
<!-- Not used in first run -->
<iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#" />
<iso:pattern id="doc.checks">
<iso:title>checking an XXX document</iso:title>
<iso:rule context="doc">
<iso:report test="chapter">Report date.
<iso:value-of select="current-dateTime()"/></iso:report>
</iso:rule>
</iso:pattern>
<iso:pattern id="chapter.checks">
<iso:title>Basic Chapter checks</iso:title>
<iso:p>All chapter level checks. </iso:p>
<iso:rule context="chapter">
<iso:assert test="title">Chapter should have a title</iso:assert>
<iso:report test="count(para)"><iso:value-of select="count(para)"/> paragraphs</iso:report>
<iso:assert test="count(para) >= 1">A chapter must have one or more paragraphs</iso:assert>
<iso:assert test="*[1][self::title]">Title must be first child of chapter</iso:assert>
<iso:assert test="@id">All chapters must have an ID attribute</iso:assert>
</iso:rule>
</iso:pattern>
</iso:schema>
Going through this, notice the following:
- Another pattern has been added for document level checks
- The report in that section outputs the current date and time using XSLT functionality.
- In that same pattern a title has been added which produces output in the final report. (The first element of decoration)
- In the chapter.checks pattern, a title and p element has been added, which may be useful for your purposes.
- An assert statement has been added to check that the title element is the first child of a chapter element.
- An assert statement has been added to ensure that a chapter has one or more paragraph.
- An assert statement has been added to check that each chapter has an id attribute.
- The report statement, counting paragraphs have been left in.
Running this version produces output as shown in
Example 3.9, “The output report with decorations”
Example 3.9. The output report with decorations
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:dp="http://www.dpawson.co.uk/ns#"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns:iso="http://purl.oclc.org/dsdl/schematron"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
title="Test ISO schematron file. Introduction mode "
schemaVersion="ISO19757-3">
<svrl:ns uri="http://www.dpawson.co.uk/ns#" prefix="dp"/>
<svrl:active-pattern name="doc.checks" id="doc.checks"/>
<svrl:fired-rule context="doc"/>
<svrl:successful-report test="chapter">
<svrl:text>Report date.2007-01-19T14:33:41.153Z</svrl:text>
</svrl:successful-report>
<svrl:active-pattern name="chapter.checks" id="chapter.checks">
<svrl:text>All chapter level checks. </svrl:text>
</svrl:active-pattern>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>1 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>3 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:fired-rule context="chapter"/>
<svrl:successful-report test="count(para)">
<svrl:text>4 paragraphs</svrl:text>
</svrl:successful-report>
<svrl:failed-assert test="*[1][self::title]">
<svrl:text>Title must be first child of chapter</svrl:text>
</svrl:failed-assert>
</svrl:schematron-output>
You should be able to see where each of the additions arises. The decorations as I've
called them are all within the text elements. It's your choice if you use them.
That summarizes the basics of Schematron. The functionality has grown from this basis.
You can achieve a great deal with these two, which formed the basis of the intial
Schematron.