XSLT: Sequences are Fundamental

While our development team is making excellent progress towards a beta of the XSLT processor for Intel SOA Expressway (http://www3.intel.com/cd/software/products/asmo-na/eng/373233.htm), I will continue looking at new features in XSLT 2.0. In this post, I’ll look at the only other major new type in the language, the sequence.

In XSLT 1.0, XPath expressions could return a node-set, which was simply a list of nodes with no duplicates and sorted in document order. The W3C working group decided this was too limiting and in XLST 2.0 introduced a generalization of the concept, the sequence. A sequence is essentially a list of items, where the items can be nodes or any atomic type as discussed in my previous post. Sequences can hold a mixture of item types, but in typical use they hold one type of value, such as only nodes or only strings. Unlike node-sets, sequences can also hold duplicates and retain the order of the items put into the sequence, but for backwards compatibility with node-sets, XPath expressions selecting nodes return a sequence of nodes with duplicate nodes removed and sorted into document order.

All data processed in a stylesheet and in XPath expressions is held in a sequence, even if the sequence holds just one value. That’s a useful concept, because all arguments to functions are sequences, which lets functions operate on lists and return lists. As a convenience, sequences with a single value can be written as just the value.

In a stylesheet, sequences can be created by the stylesheet instruction. That instruction creates a sequence by evaluating an XPath expression in its select attribute. Say the stylesheet needs to select holdings in a portfolio returned in a SOAP message:

<xsl:sequence select=”soap:body/portfolio/holding”/>

In effect, this instruction works very similarly to xsl:copy-of, except that for nodes, xsl:sequence returns the original nodes selected rather than a copy. This could be important for efficiency in some situations.

Sequences can be manipulated with a few new operators. The comma (,) operator joins sequences, and positional predicate notation can index and return an item in a sequence. For example, (“Sponsors”, “of”, “Tomorrow”)[3] results in a sequence holding the single item “Tomorrow”. The union (“|”), intersection (“intersect”), and difference (“except”) operators perform set operations on sequences of nodes. Finally, the range operator, to, constructs sequences of consecutive integer values. For example, (1 to 5) returns the sequence (1, 2, 3, 4, 5). This can be combined with the comma operator to build a sequence of different ranges, such as (1 to 5, 15 to 20).

Rounding out the sequence support, several new functions manipulate sequences, offering the capabilities you would expect for manipulating lists, such as inserting items, removing items, taking subsequences, indexing items, and reversing a sequence. Interestingly, there’s a distinct-values() function that return a sequence with duplicates removed. Although the union operator also removes duplicates, it is limited to operating on nodes, whereas distinct-values() removes duplicates of any type of item in a sequence.

Sequences are now at the heart of XPath and XSLT. By replacing the node-set with the sequence, the W3C has opened the doors to processing lists of any types of items in just about any place, whether XPath expression or XSLT instruction. The W3C didn’t sacrifice backwards compatibility with node-sets when selecting sequences of nodes, easing the transition of existing stylesheets to XSLT 2.0. New operators and functions offer all the useful capabilities expected for manipulating lists. The sequence is really a fundamental improvement to XSLT.

For more complete information about compiler optimizations, see our Optimization Notice.