Validation result caching using a keystore

Posted on May 14, 2017 by Rick Jelliffe

Scenario: You have a messaging or distributed pipeline architecture for your XML documents. An XML document make multiple stopovers from beginning to end, and a document may be stored and requested multiple times in its life.  Your documents go between different operations or groups under your roof, or comes from outside.  You want strong gateway style validation at all service entry points to reject or divert documents without required patterns or with unliked patterns.

Problem:  Schematron validation (indeed, any non-trivial validation) adds latency at each stage.

Optimization:  Create a validation server that caches validation results against a key.  Calculate the key the first time, and send this key back to the originating process. Next time they send the document, they also send the key. If the key is current, then the document does not need to be revalidated.  Document invalidity is caught at the earliest stage. Documents can be prevalidated, if appropriate; if the documents are created and stored rather than immediately sent, it could be attached to the editor’s save routine, eliminating the need for subsequent processes to perform the validation. The service is available as a unit, integration or acceptance test.

Details: The validation server accepts an XML Document with a specified Schematron schema name and phase, validates the document to boolean and calculates a CRC, then generates and stores a key for value  [CRC, Schema, schemaVersion, phase, date, result, for example using a REDIS key/value database. The validations server returns the key and the validation result.   Subsequently the key can be queried, by providing the Validation server with the key, CRC of the document, and the server will return the status.  The date can be used to invalidate results after a time.

The same process occurs at each stage in the pipeline or message: the document is sent with a key, the receiver calculates the fast CRC and checks the key. Each stage is responsible for determining which schema/version/phase is appropriate.

The CRC is required to make sure that the document being validated is the one that the key was generated for.