Chapter 6. Phases, or validation in parts

For some uses, initial validation checks are required. Unless these checks are OK, it is sometimes not worth carrying on with the remainder of the validation. Schematron addresses this requirement by using phases. For instance if our input document has no chapters, then it seems of little value to start checking all the (non-existant) chapters. So a simple early phase check would be to ensure that the root element has at least one chapter. That could be the first phase of validation. For very complex validation processes, this can save time by reducing the analysis of output records. The standard describes this usage of phases as progressive validation, which is a good description of what is happening

As an example of this see Example 6.1, “A Schematron file showing two phases ” which shows a Schematron file with two phases. The first phase carries out document level checks, in this case ensuring that the document element has a title and isbn element as children.

Example 6.1. A Schematron file showing two phases


<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema    xmlns="http://purl.oclc.org/dsdl/schematron" 
	       xmlns:iso="http://purl.oclc.org/dsdl/schematron" 
	       xmlns:sch="http://www.ascc.net/xml/schematron"
	       queryBinding='xslt2'
	       schemaVersion="ISO19757-3"
	       defaultPhase='#ALL'   1
	       >
  <iso:title>Test ISO schematron file. Introduction mode </iso:title>

<phase id="docs" >                2
<active pattern="doc.checks"/>
</phase>

<phase id="chaps">                3
  <active pattern="chap.checks"/>
</phase>
<iso:pattern id="doc.checks" >    4
  <iso:title>checking an XXX document</iso:title>
  <iso:rule context="doc">
    <iso:report test="chapter">Report date.<iso:value-of 
                      select="current-dateTime()"/></iso:report>
    <iso:report  test="title and isbn"
            >Report for book with ISBN <iso:value-of select="isbn"/></iso:report>
  </iso:rule>
</iso:pattern>

<iso:pattern   id="chap.checks">    5
  <iso:title>Basic Chapter checks</iso:title>
  <iso:p>All chapter level checks. </iso:p>
  <iso:rule context="chapter">
    <iso:assert test="title">Chapter should have  a title</iso:assert>
    <iso:assert test="count(para) >= 1">A chapter must have one or more paragraphs</iso:assert>
    <iso:assert test="*[1][self::title]"><iso:name/>  must be have title as first child </iso:assert>
    <iso:assert test="@id">All chapters must have an ID attribute</iso:assert>
  </iso:rule>
</iso:pattern>
</iso:schema>

    

1

The default phase to use as fallback

2

the docs phase (referenced from the command line)

4

The document level checks

5

the unused chapter checsk


In the document element, the attribute defaultPhase is set to the (case sensitive) string #ALL. This ensures that if nothing else is specified from the command line parameters, then all phases are run. This is generally a sensible fallback position. The overall objective is to enable runtime flexibility. When I want to specify a particular phase, I can do so via the command line parameter. This should be set as follows.


java  -mx250m -ms250m  -cp \
  .;\myjava;\myjava\saxon8.jar;\myjava\xercesImpl.jar net.sf.saxon.Transform  \
  -x org.apache.xerces.parsers.SAXParser -w1   -o tmp.xsl  \
  %1.sch iso_svrl.xsl   "generate-paths=yes"  "phase="docs"

This passes the parameter 'phase' through to iso_svrl.xsl which is then used by that stylesheet to select one of the phases specified in the Schematron file. Just below the title element are the two phases in this example. The first does document level checks, the second does chapter level checks. Make sure you don't confuse the pattern id values when specifying the phase, that is not what is being referenced. The values (in this instance) which are possible are either docs or chaps. In the modified command line above the docs phase is selected.

In order to improve these tests, the input file looks like Example 6.2, “An input file to use. input.phases.xml ”

Example 6.2. An input file to use. input.phases.xml


<?xml version="1.0" encoding="utf-8" ?>
<doc>
  <title>Book title</title>  1
  <isbn>12345678901</isbn>
<chapter id="c1">
  <title>chapter title</title>
  <para>Chapter content</para>
</chapter>

<chapter id="c2">
<title>chapter title</title>
<para>xx</para>
<para>yy</para>
<para>zz</para>
</chapter>

<chapter id="c3">
  <para>Invalid first child of chapter</para>
  <title>chapter title</title>
  <para>xx</para>
  <para>yy</para>
  <para>zz</para>
</chapter>
</doc>


    

1

The document level checks are for a document title and isbn


The only addition is the document title and an ISBN number. These are used as part of the document level checks.

Running these together produces an output like Example 6.3, “The output file running the docs phase ”"

Example 6.3. The output file running the docs phase

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<svrl:schematron-output xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                        xmlns:xs="http://www.w3.org/2001/XMLSchema"
                        xmlns:sch="http://www.ascc.net/xml/schematron"
                        xmlns:iso="http://purl.oclc.org/dsdl/schematron"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        title="Test ISO schematron file. Introduction mode "
                        schemaVersion="ISO19757-3"
                        phase="docs">
   <svrl:active-pattern name="checking an XXX document" id="doc.checks"/>
   <svrl:fired-rule context="doc"/>
   <svrl:successful-report test="chapter" location="/doc[1]">
      <svrl:text>Report date.2007-01-23T11:36:16.546Z</svrl:text>
   </svrl:successful-report>
   <svrl:successful-report test="title and isbn" location="/doc[1]">  
      <svrl:text>Report for book with ISBN 12345678901</svrl:text>  1
   </svrl:successful-report>
</svrl:schematron-output>

    

1

The report element shows which document is being tested


The output report element indicates the ISBN of the document being tested. Note that the phase in use is not reported in the output? In order to run all phases, simply omit the phase command line parameter.

So now you can add more phases, select them from the command line … and generally be more selective in your Schematron validation. With additional control (say from a script or a Java program), the phases could be progressively run to fully validate the input document.

Legal Notice