XPath query with descendant and descendant text() predicates
Asked Answered
L

3

20

I would like to construct an XPath query that will return a "div" or "table" element, so long as it has a descendant containing the text "abc". The one caveat is that it can not have any div or table descendants.

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

So the only correct result of this query would be:

/div/table/form/div 

My best attempt looks something like this:

//div[contains(//text(), "abc") and not(descendant::div or descendant::table)] | //table[contains(//text(), "abc") and not(descendant::div or descendant::table)]

but does not return the correct result.

Thanks for your help.

Lottie answered 13/10, 2010 at 5:15 Comment(0)
Q
56

Something different: :)

//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1]

Seems a lot shorter than the other solutions, doesn't it? :)

Translated to simple English: For any text node in the document that contains the string "abc" select its first ancestor that is either a div or a table.

This is more efficient, as only one full scan of the document tree (and not any other) is required, and the ancestor::* traversal is very cheap compared to a descendent:: (tree) scan.

To verify that this solution "really works":

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1] "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is performed on the provided XML document:

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

the wanted, correct result is produced:

<div>
   <span>
      <p>abcdefg</p>
   </span>
</div>

Note: It isn't necessary to use XSLT -- any XPath 1.0 host -- such as DOM, must obtain the same result.

Q answered 13/10, 2010 at 12:57 Comment(4)
thank you for your response and thank you for the +1. I prefer the compactness of this answer, however I'm unable to get it to work in my tests. The other two replies to this question work for me. Is it possible that there is a typo in your response? I can't claim to understand all of it. What does the [1] do? Again, if you have any insight as to why this answer doesn't work for me and the others do, I'd appreciate it. I would +1 for your time but I am new to this site and don't have the ability yet. Thanks.Lottie
@juan234: I have added to my answer some verification code that everyone can run and verify the correctness of the result. This verification shows the correctness of the expression -- there is no typo. You may have problems due to different reasons: from using incompliant XPath 1.0 engine to issues in your code -- to pinpoint the reason it is necessary to see your code. [1] means the first node of the nodeset selected by the part of the expression that is immediately to the right of [1] -- in reverse axes (such as ancestor:: it actually means the last node in document order).Q
I know this is old...but I just stumbled upon this while looking for different ways to do text-matching within descendants...this is elegant and easy-to-understand after seeing it...but clever enough that I had to have seen it first, and now I know a bit more about xpath :)Faintheart
@DanNguyen, Yes, XPath is a fascinating language. If you are interested in this topics, I would shamelessly recommend my courses: "XSLT 2.0 and 1.0 Foundations" (pluralsight.com/courses/xslt-foundations-part1) -- covers XPath 1.0 and XPath 2.0, and the course "The Evolution of XPath: What’s New in XPath 3.0" (pluralsight.com/courses/xpath-3-0-whats-new) -- covers XPath 3.0Q
R
2
//*[self::div|self::table] 
   [descendant::text()[contains(.,"abc")]]  
   [not(descendant::div|descendant::table)]

The problem with contains(//text(), "abc") is that functions cast node sets taking the first node.

Ruching answered 13/10, 2010 at 12:30 Comment(0)
D
1

you could try:

//div[
  descendant::text()[contains(., "abc")] 
  and not(descendant::div or descendant::table)
] | 
//table[
  descendant::text()[contains(., "abc")] 
  and not(descendant::div or descendant::table)
]

does that help?

Devaluation answered 13/10, 2010 at 9:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.