RAN-DOM (and RAN Infoset, XDM)

RAN-DOM is a DOM for RAN documents.  The RAN-XML infoset is a description of how a RAN document may appear to be processed by XML systems.

(This is an outline only.)

RAN-DOM Document Object Model

Loosely, a RAN-DOM is an XML DOM with the following new nodes:

  • fragment - extends element
  • scoped element - extends element
  • link - extends element
  • partition - (??) extends general parsed external entity, must start with a fragment
    • Invisible to XPath (implicit rules)
  • anonymous element - this is an array

Required extensions:

  • the document element allows partition nodes at its end
  • PIs have attribute start-tag syntax
  • All names and attribute values may be lexically typed: string, name, number, date-time-range, boolean, path.
  • An element may have an empty name "".
  • Element and fragment end-tags allow arbitrary data after the generic identifier, in the manner of a comment. 

There is no provision in RAN for data values to have datatypes, unless RAN-CSV is used, in which case the effective elements may have appropriate data values.

The information in the preamble is common to all.  Namespaces are implemented that a prefix on an element or attribute name can be used to look up the corresponding link. 

There is no equivalent of declarations, DOCTYPE, external entities, CDATA sections, namespace redeclaration, namespace defaults and so on in RAN.

An XML document being read into the RAN-DOM with an XML parser will be identical to that of a normal DOM, with the exception that namespaces will be transplanted elsewhere. 

The Apatak validator tables can be used for partial validation at the time elements are added, or before shipping.

XPath Data Model

A XML document loaded into a RAN-DOM document has the nodes as a conventional DOM. Similarly, it can have the same typed XDM behaviour as a  document from an XML DOM. 

Some aspects of RAN are coped with by the XDM (and Xquery) such as multiple top-level elements  (fragments can be treated as elements).  As names in RAN may be strings as well as tokens, the XPath would have to use *[local-name()="some name"] in paths for that case.

The definition of the value of an element changes: it is not the concatenation of all descendent nodes, but the concatenation of all descendent nodes of te same scope. It excludes contents of elements in descendent scoped-elements.

As with RAN-DOM, the pre-amble is available to all documents. A link tag's attributes are attached to the top-level elements and fragments with the same prefix.

The three things that do not have an equivalent in XDM are the date-time-range and path datatypes. Consequently these should be treated as strings.

RAN   Infoset

We start with the basic grammar productions:

stream    ::= preamble? partition*
preamble  ::= (link | fluff)*
partition ::= ((element | scoped-element ) fluff)? (fragment fluff)*

We see that the basic structure is a list of lists:

  • Each segment starts with a fragment or element.
  • Any other top-level tags (comments, etc) that follow a top-level fragment, element or link are bundled with it.

Each top-level fragment are each treated as virtual XML documents, according to the following rules:

  • The link declaration is applied to each of them.
    • RAN links provide XML namespace declarations and defaulted attribute values
  • Fragments are treated as XML elements
    • Element and attribute names that are literals must be replaced by e.g. Base64 versions of the literal  (allowing - and _ not / and +).
    • Attribute values not in literals are put as string literals, and typed, if available, but by the nearest PSVI equivalent
  • Top-level fluff, i.e. comments and PIs, are not visible as part of the infoset of any fragment.