XSLT function returns different results [Saxon-EE vs Saxon-HE/PE]
Asked Answered
S

1

9

I am currently working on a pure XSL-Transformation with Saxon-Processor in various versions. Below is my short stylesheet, simplified for the needs of my question:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:foo="bar">

    <xsl:output encoding="UTF-8" method="text"/>

    <xsl:template match="/">
        <xsl:text>Call of func_1: </xsl:text>        
        <xsl:value-of select="foo:func_1()"/>

        <xsl:text>&#xA;Call of func_1: </xsl:text>
        <xsl:value-of select="foo:func_1()"/>

        <xsl:text>&#xA;Call of func_1: </xsl:text>
        <xsl:value-of select="foo:func_1()"/>

        <xsl:text>&#xA;Call of func_2: </xsl:text>
        <xsl:value-of select="foo:func_2()"/>
    </xsl:template>

    <xsl:function name="foo:func_1" as="xs:string">
        <!-- do some other stuff -->
        <xsl:value-of select="foo:func_2()"/>
    </xsl:function>

    <xsl:function name="foo:func_2" as="xs:string">
        <xsl:variable name="node">
            <xsl:comment/>
        </xsl:variable>
        <xsl:sequence select="generate-id($node)"/>
    </xsl:function>

</xsl:stylesheet>

Description

foo:func_1 is a wrapper function to return the value of a second function + doing other stuff, which can be ignored. this concept of function calls other function is mandatory!

foo:func_2 generates a unique id for an element. This element is created in a local scoped variable named "node".

Different results based on Saxon versions

expected result:

Call of func_1: d2
Call of func_1: d3
Call of func_1: d4
Call of func_2: d5

Saxon-EE 9.6.0.7 / Saxon-EE 9.6.0.5 result

Call of func_1: d2
Call of func_1: d2
Call of func_1: d2
Call of func_2: d3

Saxon-HE 9.6.0.5 / Saxon-PE 9.6.0.5 / Saxon-EE 9.5.1.6 / Saxon-HE 9.5.1.6 result

like expected

Question / furthermore in depth

I debugged the problem on my own as far as i could. IF i would change the xsl:value-of in function "func_1" to xsl:sequence, the results will be the same for all versions [like expected]. But that's not my intention!

I want to understand, what is the difference between xsl:value-of and xsl:sequence throughout Saxon versions. Is there any "hidden" caching? What is the correct way to work with xsl:sequence and xsl:value-of in my case. [btw: i know already, value-of creates a text node with the result of the select-statement. sequence could be a reference to a node or atomic value. don't solve my problem afaik]

Shushan answered 8/9, 2016 at 9:20 Comment(3)
Interesting problem. But I don't understand why you write functions declared as returning a string with as="xs:string" yet then use xsl:value-of which returns a text node (which then has to be cast to a string to match the as declaration).Rickert
With Saxon 9.7 EE, if I switch off any optimization using opt:0 from the command line, then the result is a different id for each call. So it seems EE is doing some optimization that changes the result.Rickert
I think XSLT 3.0 tries to address the problem in w3.org/TR/xslt-30/#function-determinism with the new-each-time attribute.Rickert
C
3

This is a long-standing and rather deep problem. In a pure functional language, calling a pure function twice with the same arguments always produces the same result. This makes many optimizations possible, such as pulling a function call out of a loop if the arguments are invariant, or inlining a function call if it's not recursive. Unfortunately XSLT and XQuery functions aren't quite purely functional: in particular, they are defined so that if the function creates new nodes, then calling the function twice produces different nodes (f() is f() returns false).

The Saxon optimizer tries quite hard to optimize as far as it can within these constraints, in particular by recognizing functions that create new nodes and avoiding aggressive optimization of such functions.

But the spec itself isn't 100% prescriptive. For example, if as in your example there is a local variable with no dependencies on function arguments, I think the spec gives license to the implementation as to whether the value of the variable is the same node on each evaluation, or is a new node.

As Martin says, the new XSLT 3.0 attribute new-each-time is an attempt to get this under control: if you really want a new node each time the function is called, you should specify new-each-time="yes".

Note:

The specific optimization that is happening here (which you can see by running with the -explain option) is that func_2 is first inlined, and then its body is being extracted into a global variable. Some releases are doing this and others aren't - it can be very sensitive to minor changes. The best advice is not to depend on functions having this kind of side-effect. It would help if you explained your real problem, then perhaps we could find an approach that isn't so sensitive to edge cases in the language semantics.

Collect answered 8/9, 2016 at 12:26 Comment(2)
thank for your very much for the indepth view. i already thought about some processor-optimizations, hint caching.Shushan
my real scenario: i am using a widely spreaded uuid.xsl (no source, no credits known) for generating uuids in xslt; in the past i couldn't use any java-classes so i used that xsl. for now, i use xmlns:uuid="java:java.util.UUID => uuid:randomUUID() BUT it was important to me, to understand the problem maybe facing again in the future. should i still open a new thread with the real scenario? worth it? otherwise i save your precious time.Shushan

© 2022 - 2024 — McMap. All rights reserved.