What’s in Java 11 for XML Developers?

Posted on December 31, 2018 by Rick Jelliffe

Java is (by most counts) the most popular server-side platform.  Oracle have moved releases to a regular six-monthly update, and the most recent is Java 11.  Seven months ago I wrote What’s in Java 10 (and 9) for XML Developers?, so here is the update.

There are no real changes to XML APIs. However, if you go beyond Plain Old XML and uses SOAP or binding, then note that XML Web Services (javax.xml.ws) or XML Bind (javax.xml.bind) are going out of the core Java module and into java.se.ee meta-module, so the build and deployment may change. This is part of the effort to allow slimmer distros of Java, whose bloat was such a killer for Java desktop apps (and running apps from the browser, which has gone now too.)

For XML-related processing, there is a new API for synchronous and asynchronous HTTP 1.1 clients.

But more interesting are fixes for two bugbears:

  • Previously, Java was really uncompetitive for being used to run commands in script languages. This allowed perl, python and Ruby to take over that space. Java 11 has several features relieving this. In particular, you can invoke Java and provide it with a java source file, which will be compiled and run: JEP 330 Launch Single File Source Code Programs. This gets rid of that tedious explicit compilations stage.  Schematron itself is a good example of something that could benefit from such as single-file program.  It also integrates with the shell scripting system of your *NIX  platform, by allowing shebang lines so that the file can be invoked without even mentioning java:
    #!/usr/bin/java --source 11

    Along with this comes some flow-in technologies that make sense: an epsilon garbage collector that does nothing, suitable for scripts that just run and die; and you can see API changes intended to make Java more like some of the popular scripting languages, notably that Predicates (chained functions that allow logical selections, not dissimilar to XPath predicates) have been beefed up with not() and with asMatchPredicate() which should greatly simplify matching with regular expressions!   Java 11 adds null streams, like in scripting languages.

  • Many Java developers just automatically add extension libraries such as Apache’s FileUtils from their Commons collections, or Google Guava.  But the existence of these reveals deficiencies in Java’s core libraries and haphazard approach. The thing is that moving to derived and generic types requires that you also provide systematic conversions between types: the most ridiculous has been the need to make your own functions merely to copy files or byte arrays: sure it is only a few lines, but it shows a mentality: either an unconcern with usability, or an binkered focus on orthogonality and separation of concerns that sloppily concentrates only on provision of classes and generics but not the effective use of them as an ensemble.  So it is really nice to see that Java 11 has numerous small  additional functions that help round out the standard libraries, for example:
    • Convert a URI to a Path
    • Better operations on  byte arrays:
      • ByteBuffer gets inflate, deflate
      • ByteArrayOutputStream gets  writeBytes(byte [])
    • Better operations on Characters and Strings:
      • isBlank() uses Unicode properties
      • strip() uses Unicode properties (unlike trim())
      • read from Paths to Strings using specifiable Charset
      • convert a String into a Stream of lines

What is Missing?

So what is the main thing missing in Java, for making life easier for text processing/XML people?  I would say, paradoxically, it is that Java needs an unsigned byte type.  Now I can understand why Java didn’t, because C’s char is not Unicode library friendly. And I understand that Java now has its lovely String compression system, so ASCII strings take up half the size now. But really, when we are dealing with binary data, why do we really have to shift and cast to int?  Too complex.   So it would be great if Java grew an unsigned byte, but with warnings in the API doc that these are not Characters: like Rust has u8 for example.

The difficulty is that the JVM only supports signed numbers (with the exception of characters, which are u16), so it would have to be done as a layer on signed bytes in Java: maybe some String operation to allow comparisons and matches between a String or regex and a nominally signed byte array actually containing  unsigned bytes would go some of the way.

For example, I’d like

byte [] bytes = … read in external binary file

if (( new ByteArray(bytes)).containsAsUnsignedBytes(“Some string with char < U+256”) {
… the bytes contain the byte sequence ..

}

One use of this would be for, for example, peeking at the magic number signatures,