Early review of Java 9 XML catalogs

Since the first OASIS XML Catalogs spec was published, there were numerous attempts to implement it. But with JDK 9 release, XML Catalogs are to become a standard feature of the platform.

Although, for the time being, Java 9 is being one year from its release date, XML catalogs are completed. The feature is accessible in the early access release at https://jdk9.java.net/download/.

I'm an easy rider

Now, let us throw a glance at the brand new XML Catalogs delivered by JEP 268. Historically, Java had its own resolver under the hood. It was accessible only within JDK. Folks from Java community have revisited the old code and decided to come up with a totally new, public API.

Looking at the new code, you will find that it's nicely written, perfectly documented and has a decent test suite. Oh, and for those of us who care about standards, JEP 268 is fully compliant with the OASIS XML Catalogs 1.1 specification – the latest one.

According to the JEP, the new main concepts are:

  • The CatalogManager will manage the creation of XML Catalogs and CatalogResolvers, as well as features and properties.
  • A Catalog will implement the semantics of OASIS Open Catalog files. It will define an entity catalog that maps external identifiers and URI references to (other) URI references, and delegates to other catalogs.
  • A CatalogResolver will implement JAXP's existing EntityResolver and URIResolver interfaces. The resolver will support the OASIS standard processing instruction as a SAX XMLFilter.

I've Been Everywhere

CatalogManager

CatalogManager is responsible for parsing catalogs and creating catalog resolvers. It is nothing more than a collection of static factory methods.

To parse a catalog, we first instantiate CatalogFeatures, which is just a configuration object. For now, let's take the defaults. The second argument denotes the catalog file's location. Actually, it is a vararg option, hence we can pass multiple paths (for main and alternative catalogs).

CatalogFeatures defaults = CatalogFeatures.defaults();
Catalog catalog = CatalogManager.catalog(defaults, "catalog.xml", "catalog-alt.xml");

The paths, semi-colon delimited, can be also set by the java.xml.catalog.files property. Make sure all settings are set before CatalogFeatures object is created. The features are evaluated eagerly.

System.setProperty("javax.xml.catalog.files", "catalog.xml");
CatalogFeatures defaults = CatalogFeatures.defaults();
Catalog catalog = CatalogManager.catalog(defaults);

If we want custom features, we can use CatalogFeatures.builder(). Actually, CatalogFeatures.defaults() is a shortcut for CatalogFeatures.builder().build().

CatalogFeatures features = CatalogFeatures.builder()
    .with(Feature.FILES, "catalog.xml")
    .with(Feature.PREFER, "public")
    .with(Feature.DEFER, "true")
    .with(Feature.RESOLVE, "ignore")
    .build();

The framework evaluates features in the following order:

  1. JAXP API properties
  2. JAXP system properties
  3. jaxp.properties file
  4. default value

For more details, please consult the javax.xml.catalog.CatalogFeatures Javadoc.

Catalog

The intention behind Catalog is to perform simple lookups. It is a pure interface that does not inherit from anything. Catalog can serve as a building block for catalog-aware applications. Catalogs are also heavily used by CatalogResolvers.

  1. Find a matching entry by a system ID.
  2. Find an entry by a public ID.
  3. Find an entry by a URI element.

In the blog post we are using the example DocBook catalog from the OASIS specification.

 Catalog catalog = ...;
 String match = catalog.matchPublic("-//OASIS//ELEMENTS DocBook XML Information Pool V4.1.2//EN");
 System.out.println(match);

http://www.oasis-open.org/docbook/xml/4.1.2/dbpoolx.mod

Catalog can also return alternative catalogs as Java 8 Stream, which is pretty cool. Below I grab the first alternative catalog that has public OASIS DocBook entries.

public static void main(String[] args)  {
    Catalog catalog = CatalogManager.catalog(defaults, "catalog.xml", "catalog-alt-1.xml", "catalog-alt-2.xml");
    Optional<Catalog> catalog = catalog.catalogs()
        .filter(CatalogTest::hasDocbookDefinition)
        .findFirst();
}

private static boolean hasDocbookDefinition(Catalog catalog) {
    return catalog.matchPublic("-//OASIS//DTD DocBook XML V4.1.2//EN") != null;
}

CatalogResolver

CatalogResolver implements SAX, DOM LS and Transform interfaces for URI and entity resolution.

  • org.xml.sax.EntityResolver
  • javax.xml.stream.XMLResolver
  • org.w3c.dom.ls.LSResourceResolver
  • javax.xml.transform.URIResolver

Create CatalogResolver objects by referencing a previously initialized Catalog:

Catalog catalog = ...; 
CatalogResolver catalogResolver = CatalogManager.catalogResolver(catalog);

or by parsing catalog files on the go:

CatalogFeatures defaults = CatalogFeatures.defaults();
CatalogResolver catalogResolver = CatalogManager.catalogResolver(defaults, "catalog.xml");

Now we can tweak our tools (e.g. SAX parsers or the XSLT processor) with the created CatalogResolver. In the example here I use it for entity resolution.

CatalogFeatures defaults = CatalogFeatures.defaults();
CatalogResolver catalogResolver = CatalogManager.catalogResolver(defaults, "catalog.xml");
InputSource inputSource = catalogResolver.resolveEntity("-//OASIS//DTD DocBook XML V4.1.2//EN", "");

String systemId = inputSource.getSystemId();
System.out.println(systemId);

http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd

I'm Movin' On

My impressions of some third-party XML resolvers were frustrating. I remember the night I spent trying to fix a nextCatalog bug in some of them. Although my fix worked, it was a completely intuitive solution. The code was somewhat difficult to understand.

The lack of support causes companies to develop customized versions of open-source catalog resolvers. On the other hand, enthusiasts from the XML community developed their own tools. So, now we live in a little zoo of implementations. :-)

Having said that, I am excited about the new standard API. It is well designed and easy to learn. Being introduced at JDK level, the API will get high attention, good support and, potentially, no unresolved bugs in the long perspective.

Please leave your opinions in comments.

Serhiy Hapiy

Started my career in software development in 2012, working on XSLT solutions. Later participated in core Java and Python projects. My current domain is Big Data and Machine Learning.

comments powered by Disqus