Islands of Validity appear on the horizon again

Posted on April 2, 2019 by Rick Jelliffe

I associate the idea of “Islands of Validity” with W3C’s esteemed Dave Raggett: IIRC he proposed it to describe where you stick some domain specific chunks of XML into an HTML document: you don’t need to validate the HTML, because …well… HTML, but you may want to describe your chunks of data in more rigorous ways that allow checking (static by validating the XSD, dynamic by validating the XML, visual audits, whatever).

The idea pretty much went away with AJAX: Schematron supports that kind of thing well (an Island of Validity is pretty much the same thing as a Schematron pattern, it seems to me), and XML Schemas can also do so, to some extent with wildcards and skip validation (for an XSD angle, see Michael McQueen’s  How schema-validity is different from marriage.) 

But in today’s newspaper it came up again. Our prudential regulator (APRA) has finished evaluating data-provider feedback on their XBRL data submission systems, part of the glacial SBR project for single schemas for business reporting. (Disclosure: I worked on a team on part of this, relating to Schematron validation, maybe 11 years ago, so the evaluation is of interest to me.)  So not a data island embedded in HTML, but embedded in XBRL superstructures.

So after 10 years of use, one of the major feedback items for the data collection system was that data providers want the system to accept parts of the complex data submissions that are valid, so that they only need to resubmit the parts that are invalid.  Which looks like “islands of validity” to me: identify the sections with broken or missing patterns, mark them as not for submission, and ingest the safe rest.  We are talking giant forms here: detailed summaries of a bank’s activities.  It can take hours to ingest and revalidate from scratch.

Schematron provides, of course, specific facilities to help this scenario. For a start, you make each different island of validity into a different pattern (or set of patterns if that is more convenient to declare.) You get a set of XPaths in the SVRL that let you know which islands are broken, and using the role attributes you can even identify which individual subparts of some island that has some problem, and whether it is a showstopper for ingestion into the recipient database or something that can be taken but needs to be corrected by the data provider in good time.

For example, you might have a requirement that some report was signed off by some responsible authority (person) on some date and with some phone number. But no phone number is provided: do you A) Reject the whole document B) reject that island B) Accept the data C) Accept it but generate an warning notice. Schematron lets you mark up in the schema which policy is appropriate for each information item, and validation can provide an SVRL result labelling each information item according to the policy. Your ingesting system picks this up and uses it (generating the error message, stripping out unwanted data, etc).