Only select text directly in node, not in child nodes
Asked Answered
N

3

50

How does one retrieve the text in a node without selecting the text in the children?

<div id="comment">
     <div class="title">Editor's Description</div>
     <div class="changed">Last updated: </div>
     <br class="clear">
     Lorem ipsum dolor sit amet.
</div>

In other words, I want Lorem ipsum dolor sit amet. rather than Editor's DescriptionLast updated: Lorem ipsum dolor sit amet.

Navarino answered 19/12, 2010 at 16:50 Comment(0)
E
52

In the provided XML document:

<div id="comment">
      <div class="title">Editor's Description</div>
      <div class="changed">Last updated: </div>
      <br class="clear">
      Lorem ipsum dolor sit amet. 
</div> 

the top element /div has 4 children nodes that are text nodes. The first three of these four text-node children are whitespace-only. The last of these 4 text-node children is the one that is wanted.

Use:

/div/text()[last()]

This is different from:

/div/text()

The latter may (depending on whether whitespace-only nodes are preserved by the XML parser) select all 4 text nodes, but you only want the last of them.

An alternative is (when you don't know exactly which text-node you want):

/div/text()[normalize-space()]

This selects all text-node-children of /div that are not whitespace-only text nodes.

Ewers answered 19/12, 2010 at 17:3 Comment(13)
@Dimitre, the question is to select the text without child nodes, the first suggestion by you doesn't do this.Fourgon
@Lucero: Why? I haven't suggested the use of the descendant:: axis or the // abbreviation. The first expression selects just one text node: the last child text node of /div. the alternative selects any child text node of /div that is not whitespace-only.Ewers
@Dimitre, simply because nothing says that the wanted text will be the last node?Fourgon
@Lucero: I have edited my answer to make it more clear. Hope you understand it now.Ewers
@Dimitre, the question was to get the text without the text of the child nodes. Getting the last text node only is working for the given sample, but not answering the question in general.Fourgon
@Lucero: I think that the edited answer meets your objections -- it explains the two alternatives one has: either know exactly which node you want to select, or select all text nodes that are not white-space only. Both expressions avoid selecting whitespace-only text nodes -- something that may happen using your suggested solution. Do note that the OP really wants only non-whitespace-only text nodes.Ewers
@Dimitre, in fact the white space stripping was useful as well, thanks to bothNavarino
I just don't get why both of the solutions don't work for me in Firefox with XPather, but //div/text()[normalize-space() and parent::div[@id='comment']] is fine.Lynnet
@styu: Then you are evaluating the XPath expressions against a different XML document (not against the provided XML document)Ewers
@Dimitre I think it's an issue with XPather. Your XPath Visualizer and an other one works fine, thanks.Lynnet
This does not solve the answer for me. I need the xpath result to be in the form of a webelement, not a String, and so using /text() is not an option.Toffeenosed
@djangofan, text() selects all text-node children of the current node -- not strings as you believe. As for "webelements", no such thing exists in XPath.Ewers
@SeanDuggan, Yes, XPath is a very elegant and powerful language.Ewers
F
18

Just select text() instead of .:

div/text()

On the given XML fragment, this returns:

Lorem ipsum dolor sit amet.
Fourgon answered 19/12, 2010 at 16:56 Comment(0)
C
1

How about this :
$doc/node()[3]/text()
Assuming $doc has the xml.

Chariot answered 25/4, 2017 at 15:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.