XPath 2.0 has some new functions and syntax, relative to 1.0, that work with sequences. Some of theset don't really add to what the language could already do in 1.0 (with node sets), but they make it easier to express the desired logic in ways that are more readable. This increases the chances of the programmer getting the code correct -- and keeping it that way. For example,
empty(s)
is equivalent tonot(s)
, but its intent is much clearer when you want to test whether a sequence is empty.- Correction: the effective boolean value of a sequence is in general more complicated than that. E.g.
empty((0))
!=not((0))
. This applies toexists(s)
vs.s
in a boolean context as well. However, there are domains ofs
whereempty(s)
is equivalent tonot(s)
, so the two could be used interchangeably within those domains. But this goes to show that the use ofempty()
can make a non-trivial difference in making code easier to understand.
- Correction: the effective boolean value of a sequence is in general more complicated than that. E.g.
- Similarly,
exists(s)
is equivalent toboolean(s)
that already existed in XPath 1.0 (or justs
in a boolean context), but again is much clearer about the intent. - Quantified expressions; e.g. "
some $x in
expressionsatisfies
test($x)
" would be equivalent toboolean(
expression[
test(.)])
(although the new syntax is more flexible, in that you don't need to worry about losing the context item because you have the variable to refer to it by). - Similarly, "
every $x in
expressionsatisfies
test($x)
" would be equivalent tonot(
expression[not(
test(.))])
but is more readable.
These functions and syntax were evidently added at no small cost, solely to serve the goal of writing XPath that is easier to map to how humans think. This implies, as experienced developers know, that understandable code is significantly superior to code that is difficult to understand.
Given all that ... what would be a clear and readable way to write an XPath test expression that asks
Does value X occur in sequence S?
Some ways to do it: (Note: I used X
and S
notation here to indicate the value and the sequence, but I don't mean to imply that these subexpressions are element name tests, nor that they are simple expressions. They could be complicated.)
X = S
: This would be one of the most unreadable, since it requires the reader to- think about which of X and S are sequences vs. single values
- understand general comparisons, which are not obvious from the syntax
- However, one advantage of this form is that it allows us to put the topic (X) before the comment ("is a member of S"), which, I think, helps in readability.
- See also CMS's good point about readability, when the syntax or names make the "cardinality" of X and S obvious.
index-of(S, X)
: This one is clear about what's intended as a value and what as a sequence (if you remember the order of arguments toindex-of()
). But it expresses more than we need to: it asks for the index, when all we really want to know is whether X occurs in S. This is somewhat misleading to the reader. An experienced developer will figure out what's intended, with some effort and with understanding of the context. But the more we rely on context to understand the intent of each line, the more understanding the code becomes a circular (spiral) and potentially Sisyphean task! Also, sinceindex-of()
is designed to return a list of all the indexes of occurrences of X, it could be more expensive than necessary: a smart processor, in order to evaluateX = S
, wouldn't necessarily have to find all the contents of S, nor enumerate them in order; but forindex-of(S, X)
, correct order would have to be determined, and all contents of S must be compared to X. One other drawback of usingindex-of()
is that it's limited to usingeq
for comparison; you can't, for example, use it to ask whether a node is identical to any node in a given sequence.- Correction: This form, used as a conditional test, can result in a runtime error:
Effective boolean value is not defined for a sequence of two or more items starting with a numeric value
. (But at least we won't get wrong boolean values, sinceindex-of()
can't return a zero.) If S can have multiple instances of X, this is another good reason to prefer form 3 or 6.
- Correction: This form, used as a conditional test, can result in a runtime error:
exists(index-of(X, S))
: makes the intent clearer, and would help the processor eliminate the performance penalty if the processor is smart enough.some $m in S satisfies $m eq X
: This one is very clear, and matches our intent exactly. It seems long-winded compared to 1, and that in itself can reduce readability. But maybe that's an acceptable price for clarity. Keep in mind that X and S could potentially be complex expressions themselves -- they're not necessarily just variable references. An advantage is that since theeq
operator is explicit, you can replace it withis
or any other comparison operator.S[. eq X]
: clearer than 1, but shares the semantic drawbacks of 2: it computes all members of S that are equal to X. Actually, this could return a false negative (incorrect effective boolean value), if X is falsy. E.g.(0, 1)[. eq 0]
returns 0 which is falsy, even though0
occurs in(0, 1)
.exists(S[. eq X])
: Clearer than 1, 2, 3, and 5. Not as clear as 4, but shorter. Avoids the drawbacks of 5 (or at least most of them, depending on the processor smarts).
I'm kind of leaning toward the last one, at this point: exists(S[. eq X])
What about you... As a developer coming to a complex, unfamiliar XSLT or XQuery or other program that uses XPath 2.0, and wanting to figure out what that program is doing, which would you find easiest to read?
Apologies for the long question. Thanks for reading this far.
Edit: I changed =
to eq
wherever possible in the above discussion, to make it easier to see where a "value comparison" (as opposed to a general comparison) was intended.
functx:is-node-in-sequence($X, $Y)
– Tit