Know-how: Debugging complex XSLT modules

Have you ever found yourself spending hours trying to track down a transform problem? Especially, when working with a messy or ill-structured XSLT module?

Let's consider this puzzling template:

<xsl:template match="section">
    <xsl:next-match/>
</xsl:template>

<!-- Unrelated templates follow -->

Hmm… What does the xsl:next-match above do? Where does it redirect to? It was me who wrote the code, but I hardly remember what I meant.

So, how can I debug it? Modern XML editors provide excellent XSLT debuggers, and there are back-mapping solutions available on the market. Unfortunately, not all the tools are equally convenient, plus, they cost money, and it often takes time to learn and tune them.

In this article we'll discuss how to build a personal back-mapping solution for everyday needs. We'll only need Saxon HE, Xerces and some knowledge of Java.

A naïve solution

Let me repeat the problem again. I have a source document and a quite complex transform. And I want to visualise the relation between source nodes and their matching templates.

A solution which immediately comes to mind is to use xsl:message to collect debug info. So I estimate which templates might be in use and insert instructions like:

<xsl:message>
    Context: <xsl:sequence-of select="."/>
    Match: <xsl:value-of select="'attribute() | node()'"/>
</xsl:message>

This method can work if I'm familiar with the code base, which is additionally relatively small. But what if I'm dealing with a really complex module with modes and priorities?

XSLTTraceListener

net.sf.saxon.lib.TraceListener is Saxon interface to collect tracing information. It can be used, for example, in performance analysis. It is also put to good use by XQuery developers, because they don't have such a tool as xsl:message. But soon we'll find out how we can benefit from it, too.

Today we are interested in net.sf.saxon.trace.XSLTTraceListener, a standard implementation of TraceListener, which gathers tracing info from XSLT stylesheets.

Command line usage:

java -jar saxon9he.jar [options] -T: net.sf.saxon.trace.XSLTTraceListener

Saxon 9 API usage:

import net.sf.saxon.s9api.XsltTransformer;
import net.sf.saxon.trace.XSLTTraceListener;

// ...

XsltTransformer xsltTransformer = // Initialisation code
xsltTransformer.setTraceListener(new XSLTTraceListener());

A trace is printed to standard output as a well-formed XML document.

<trace saxon-version="9.7.0.6">
    <source node="/comment()[1]" line="-1" file="books.xml"></source>
    <!-- /comment()[1] -->
    <source node="/catalog" line="6" file="books.xml">
        <source node="/catalog/book[12]" line="113" file="books.xml">
            <source node="/catalog/book[12]/author[1]" line="114" file="books.xml">               
    ...   

I can also complement the trace info by adding user trace calls. Let's modify the XSLT a bit:

<xsl:template match="catalog">
    <xsl:sequence select="trace((), 'catalog')"/>
    ...
</xsl:template>

The standard output:

<source node="/catalog" line="6" file="books.xml">
    <user-trace label="catalog" value="()" line="7" module="books.xsl">
    </user-trace>
    ...
</source>

Inserting trace calls on-the-fly

We have managed to combine source and stylesheet information within a single output. But what are the advantages over using xsl:message? It's even worse, since the output is more verbose.

The good part about the trace output is that it's XML :-). It can be stored, analysed and transformed as XML documents normally can. What is more, XSLTTraceListener can be extended to produce pretty HTML reports. What if we could insert the trace calls dynamically, taking the guesswork out of debugging? The good news is that we can. By implementing custom org.xml.sax.XMLReader.

XMLReader is a Saxon compatible SAX interface. It declares methods like startElement, startCDATA, endCDATA and allows clients to implement custom XML parsers. We can extend the standard Xerces parser to insert trace calls in xsl:template bodies. Let's find out how.

Putting it all together

So, we'll start with extending the SAXParser class.

package rocks.xml.saxontrace;

import org.apache.xerces.parsers.SAXParser;

public class TraceXmlReader extends SAXParser {
    /// ...
}

Now we can override the startElement method.

@Override
public void startElement(QName element, XMLAttributes attributes, Augmentations augs) throws XNIException {
    super.startElement(element, attributes, augs);
    if (isMatchTemplate(element, attributes)) {
        sequenceWithTraceCall(attributes);
    }
}

isMatchTemplate verifies if particular start element data represents an xsl:template start tag with a @match attribute.

private boolean isMatchTemplate(QName element, XMLAttributes attributes) {
    return TEMPLATE_NAME.equals(element) && attributes.getIndex("match") != -1;
}

sequenceWithTraceCall, in its turn, emulates an xsl:sequence instruction with a call of trace() function. The corresponding match pattern is passed as trace argument.

private void sequenceWithTraceCall(XMLAttributes attributes) {
    XMLAttributesImpl sequenceAttributes = new XMLAttributesImpl();
    String selectValue = String.format("trace((), '%s')", attributes.getValue("match"));
    sequenceAttributes.addAttribute(SELECT_NAME, "CDATA", selectValue);

    super.startElement(SEQUENCE_NAME, sequenceAttributes, new AugmentationsImpl());
    super.endElement(SEQUENCE_NAME, new AugmentationsImpl());
}
<xsl:sequence select="trace(., 'attribute() | node()')"/>

Here's a Gist for completeness.

Now we can set the newly implemented class as our stylesheet parser.

Command line usage:

java -jar saxon9he.jar [options] -T: net.sf.saxon.trace.XSLTTraceListener -y:rocks.xml.saxontrace.TraceXmlReader

Saxon 9 API usage:

import net.sf.saxon.Configuration;
import net.sf.saxon.s9api.Processor;

// ...

Processor processor = new Processor(false);
Configuration configuration = processor.getUnderlyingConfiguration();
        configuration.setStyleParserClass("rocks.xml.saxontrace.TraceXmlReader");

The final output is:

<source node="/catalog/book[12]/publish_date[1]" line="118" file="books.xml">
    <user-trace label="book/element()" value="()" line="26" module="books.xsl"></user-trace>
</source>
<!-- /catalog/book[12]/publish_date[1] -->
<source node="/catalog/book[12]/description[1]" line="119" file="books.xml">
    <user-trace label="book/element()" value="()" line="26" module="books.xsl"></user-trace>
</source>

Conclusions

The idea for this article was provided by my dear colleague, who was struggling to understand the behaviour of an XSLT stylesheet. Sadly, effective debugging is a very common problem.

However, my goal was not to build a solution for any occasion, a Swiss Army knife. I rather wanted to show how powerful Saxon API is, and throw in another engineering technique.

Our solution is not a drop-in replacement for the proprietary software. On the other hand, it is free and flexible in terms that with the given APIs we can build a debugging tool of any complexity level: would it be a debugger, a back-mapping application, or a report visualiser.

Serhiy Hapiy

Started my career in software development in 2012, working on XSLT solutions. Later participated in core Java and Python projects. My current domain is Big Data and Machine Learning.

comments powered by Disqus