XSLT document() : Is it slower when calling it multiple times?

Asked 10/5, 2011 at 7:33 Answered 10/5, 2011 at 15:35

UPDATE 17.Jul.2013:
XALAN 2.7 does not cache document() calls within a request. So it is crucial to store each needed document in a variable in the XSL.

I have searched for quite a while and didn't find concrete answers to my simple question:

Which approach is faster or is the compiler "smart" enough so that both variants are the same?

Note: I am using Xalan 2.7 (default implementation in JDK 1.6):

1) I have to read a property in an external XML:

<xsl:value-of select="document($path)/person/address/city"/>

Whenever I need the city, I use the expression above (let's say 100 times)

2) Instead of calling the document() 100 times, I store the XML node in a variable:

<xsl:variable name="node" select="document($path)"/>

And then I use 100 times

<xsl:value-of select="$node/person/address/city"/>

Which one is faster, better, for which reasons? Thank you!

Matilda answered 10/5, 2011 at 7:33 Comment(5)

I'm also intrested in an expert answer, but, as i think, case with multiple calls of document(path_to_doc) are dependent on the xslt processor caching realization, in the case, when document node stored in the variable it must be loaded once in any case. – Tarp 10/5, 2011 at 8:35

Yes, I also guess that it depends on the implementation of the processor, but I'm curious how Xalan 2.7 (default processor in JDK 1.6) does it. – Matilda 10/5, 2011 at 9:10

I'm not 100% positive but I think Xalan does not cache document() results, but xsltproc does. However the document() argument is interpreted as an URI (see spec), so an aggressive caching would make perfect sense. – Trierarch 10/5, 2011 at 9:33

Good question, +1. See my answer for explanation and a recommendation of a third, more efficient solution. – Waller 10/5, 2011 at 16:22

Tested with XALAN 2.7 : each document() call will be executed and includes physical file access. So at least for XALAN 2.7 it makes a lot of sense to store the document in a variable. I updated my question with the test results. – Matilda 17/7, 2013 at 7:27

Both methods should execute for the same time if an XSLT processor is not naive, because the document function should return the same result when it is called with the same argument(s), no matter how many times.

Both methods are not efficient, because of the use of the // abbreviation, which causes the whole document tree to be traversed.

I would recommend the following as more efficient than both methods are being discussed:

<xsl:variable name="vCities" select="document($pUrl)//cities"/>

then only reference $vCities.

In this way you have traversed the document only once.

Waller answered 10/5, 2011 at 13:42 Comment(5)

+1. Dimitre, can you give me a reference for the idempotence rule you mentioned? I have heard that before but was surprised not to see it in the XSLT 1.0 or 2.0 specs. – Plains 10/5, 2011 at 15:31

btw: the // was only an example and should not have been part of my question, sorry! the focus is on the document() function. So I'm still unsure whether it makes a difference in XALAN 2.7! – Matilda 11/5, 2011 at 13:1

Corrected question: It does not contain the bad exmple anymore. I removed it because the discussion here should be on the document function. – Matilda 11/5, 2011 at 13:3

@Matilda -- you can and must run your own benchmark. I believe Xalan is not a naive non-optimizing processor and that you will not gain much, if anything by adding your own caching. – Waller 11/5, 2011 at 13:14

It seems that you understand the principles involved, so you don't need any explanations there.

If you want to know how Xalan 2.7 does it, the definitive answer will be found by testing it with Xalan 2.7, with a large enough test.

As @Dimitre noted, neither one of these is necessarily efficient, because of the //, though some processors are smart about optimizing those kinds of paths, mitigating the problem. You could help the processor be more efficient by keeping the city element in a variable:

<xsl:variable name="city" select="(document($path)//city)[1]"/>
...
<xsl:value-of select="$city"/>

I added [1] in there for further optimization because you said "the city" (i.e. you expect only one), and this allows a smart processor to stop after it finds the first city element.

Plains answered 10/5, 2011 at 15:35 Comment(3)

The discussion is not about the //, I removed it from the example. I will test the document() by trying to see requests in the log for every document() call. But before investing time in this, I thought somebody here would know it (from the source code). – Matilda 11/5, 2011 at 13:5

Anyone care to explain why the downvote? Don't know if it was from @bas – Plains 11/5, 2011 at 19:37

Recommended topics

Hot tags