The most recent specifications for ODF and OOXML are available free at the ISO Publicly Available Standards website. They have both been substantially revised and augmented over the past decade.
For Open Document Format, the standard is split into three parts:
- ISO/IEC 26300-1:2015 Information technology — Open Document Format for Office Applications (OpenDocument) v1.2 — Part 1: OpenDocument Schema
- ISO/IEC 26300-2:2015 Information technology — Open Document Format for Office Applications (OpenDocument) v1.2 — Part 2: Recalculated Formula (OpenFormula) Format
- ISO/IEC 26300-3:2015 Information technology — Open Document Format for Office Applications (OpenDocument) v1.2 — Part 3: Packages
The ISO site does not distribute the schema for ODF. However, the standard gives its location in the Related Work section: ISO ODF was OASIS ODF subjected to international review and revision, which was fed back into the OASIS website. So you can get the RELAX NG schema for ODF at OASIS.
[N.b. According to Oxygen, the schemas have a couple of faults, which you might need to tidy up, or check if there are corrected versions available.]
For the Office Open XML specification, which came out of the ECMA standards consortium, the standard is split into four parts:
- ISO/IEC 29500-1:2016 Information technology — Document description and processing languages — Office Open XML File Formats — Part 1: Fundamentals and Markup Language Reference
- ISO/IEC 29500-2:2012 Information technology — Document description and processing languages — Office Open XML File Formats — Part 2: Open Packaging Conventions
- ISO/IEC 29500-3:2015 Information technology — Document description and processing languages — Office Open XML File Formats — Part 3: Markup Compatibility and Extensibility
- ISO/IEC 29500-4:2016 Information technology — Document description and processing languages — Office Open XML File Formats — Part 4: Transitional Migration Features
The schemas for Parts 1, 3 and 4 are available at the ISO Publicly Available Standards from the Electronic inserts links. (Part 2 does not define any XML language, hence no schema.) The schemas are available in W3C XSD and ISO RELAX NG versions: the XSD schema for Word is called wml.xsd and is 160kb; in RELAX NG, it is 75kb.
The trick with OOXML schemas is that ISO OOXML defines two different conformance classes:
- Strict, which is the schemas in Part 1, and represents what the ISO process wanted OOXML to be.
- Transitional, which adds the schemas in Part 4, and approximates what OOXML was in its first incarnation in Office 2007. For example, it allows the obsolete drawing language VML.
The big SNAFU here is that Part 1 and Part 4 use different namespace URIs for the same elements, in essence
- Transitional namespaces follow a pattern
- Strict namespaces follow a pattern
- (By the way, if you have a document with namespaces starting
http://schemas.microsoft.com/office/then you are not dealing with OOXML at all: you are using Microsoft’s obsolete Office 2003 Word XML format, which is quite different though it pioneered some of the idioms that OOXML uses.)
So implementers of OOXML need to decide which schema to support, and how. The trouble is that most XML libraries or tools don’t support the idea that a namespace could be changed without changing a language: XSLT and XPath do not support wildcarding of namespaces for example.
(An implementer may find it simpler to simply remap both Transitional and Strict namespace URIs to one or the other or to some vendor-specific single namespace on import, for example by intercepting the parser’s SAX steam, and to regenerate the correct namespace on export.)
People who are interested in converting between ODF and OOXML, or understanding why they vary on their edges so much, may be interested in the the Technical Report
ISO/IEC TR 29166:2011 Information technology — Document description and processing languages — Guidelines for translation between ISO/IEC 26300 and ISO/IEC 29500 document formats
which I think came from material prepared by the German standards body, by the Frauenhofer Society. But the study relates to ODF and OOML as they were a decade ago, so some of the details could be wrong now.
I was happy to see that some W3C standards are heading towards ISO as well. The math community, like the internationalization community, is encouragingly non-partisan. ISO SC34 entrusted the maintenance of the SGML public entity sets to the W3C MathML WG, and now I see MathML is a full ISO standard. I read some report that the HTML 5 WG had a disinterest in MathML, so if the W3C effort collapses, it could be rehoused at ISO SC34, I am sure (though I don’t think there is any possiblity of either of those things!)
ISO/IEC 40314:2016 Information technology — Mathematical Markup Language (MathML) Version 3.0 2nd Edition
Other publicly available ISO standards of interest to document processing people are:
- ISO/IEC 21778:2017 Information technology — The JSON data interchange syntax
- ISO/IEC 21320-1:2015 Information technology — Document Container File — Part 1: Core (People should call it what it is: ISO ZIP! It is the core ZIP format as used by OOXML, ODF, etc, in case the PKWARE note is lost.)
- ISO/IEC 19845:2015 Information technology — Universal Business Language Version 2.1 (UBL v2.1)
- The multiple parts of ISO/IEC 19757 Information technology — Document Schema Definition Languages (DSDL) which includes RELAX NG and Schematron