Islands of Validity appear on the horizon again

I associate the idea of “Islands of Validity” with W3C’s esteemed Dave Raggett: IIRC he proposed it to describe where you stick some domain specific chunks of XML into an HTML document: you don’t need to validate the HTML, because …well… HTML, but you may want to describe your chunks…

...

What is the difference between Schematron’s Role and Flag attributes?

A question recently came up on the difference between the @role attribute and the @flag attribute in Schematron. Both these attributes provide extra information that can be found in the SVRL (Schematron Validation Report Language) result of a validation. The @role attribute lets you provide extra information about the local…

...

Probabilistic Schemas, Hidden Markov Models, Neural Nets for XML

Recently I have been looking at probabilistic schemas.  For the sake of helping along the conversation, what untapped possibilities are there? Lets look at some ideas in increasing complexity: Probabilitistic Pairs: This approach is just a table of each possible element, and the probability of the some relationship between one…

...

How many documents do you need to test?

The scenario: you have a large corpus of traditional SGML/XML type documents, semi-structured text not database dumps, with schema or DTD structures coping with the great variety of structures found in real life:  with the position independence, repetition and recursion that makes enumerating all possible documents i,practical due to combinatorial…

...

Correct and Robust: Schematron’s assert versus report

Schematron patterns contain rules. A node in the document validated fires at most one rule per pattern, however it may fire a rule in more than one pattern.  That node then provides the context which assertions are tested.  Vitally, these are tied to natural language statements of the assertion, allowing…

...

Examples of Co-Occurrence Constraints

From the Archives!  I found an interesting old page on the W3C which gives a list of various examples of Co-occurrence constraints. I expect it was created by the W3C XML Schemas Working Group to get Use Cases for the XPath features which were added in XSD 1.1. Some of…

...

Standard Severity Levels with Schematron @role

Each assert or report element (and, in fact, rules, patterns, etc) can have a role attribute. The intention of this attribute is to allow the assertion etc to be categorized, and the prime categorization we might use is severity. So rather than validation being just either too-simple binary YES/NO, or…

...

New XSLT2 implementation of ISO Schematron: SchXslt

There is a promising looking new implementation of Schematron for XSLT2 up at GitHub, the Open Source SchXslt project.  Kudos to David Maus for developing this, and making it available. It is intended as a drop-in replacement for the most common skeleton implementation (also on GitHub at Schematron), so it…

...

Schematron reimagined for JSON/JSONPath

On GitHub you can find jsontron which is Schematron  moved out of the XML/XSLT/XPath ecosystem and applied to the JSON/JavaScript/JSONPath ecosystem.  What is particularly pleasing to me is that this seems to be a really full implementation of ISO Schematron, including phases (not abstract rules and abstract patterns, no biggie.) It…

...

Assertions in Java

In Schematron, an assertion is a positive natural language statement about some aspect of a pattern that is expected to be found in an XML document.  It is implemented (to whatever extent possible) with the assert test. However, in programming languages, assertions have a wider range of meaning. (To the…

...

The Most Common Programming Error with Schematron

Schematron is a small, simple language, by design.  The complexity is not in the elements, but sloughed off to the XPaths. But if there is one mistake that I sometimes see developers make, it is this: people think that all the rules in a pattern will be tried.  In fact,…

...

Sorting out Log4J 2.0’s strict schemas

Log4J version 2.0 has two dialects of its XML configuration language: concise and strict. Concise mode is fairly well document and freeform. It uses the reflection API and plugins, so that if there is a plugin available, you can just call its name directly.  So there is no schema: the…

...