Using XPATH to search text containing  
Asked Answered
I

8

127

I use XPather Browser to check my XPATH expressions on an HTML page.

My end goal is to use these expressions in Selenium for the testing of my user interfaces.

I got an HTML file with a content similar to this:

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

I want to select a node with a text containing the string "&nbsp;".

With a normal string like "abc" there is no problem. I use an XPATH similar to //td[text()="abc"].

When I try with an an XPATH like //td[text()="&nbsp;"] it returns nothing. Is there a special rule concerning texts with "&" ?

Illdefined answered 29/10, 2008 at 15:0 Comment(1)
Does your actual XSL transformation return nothing? Or only Xpather?Dali
I
96

It seems that OpenQA, guys behind Selenium, have already addressed this problem. They defined some variables to explicitely match whitespaces. In my case, I need to use an XPATH similar to //td[text()="${nbsp}"].

I reproduced here the text from OpenQA concerning this issue (found here):

HTML automatically normalizes whitespace within elements, ignoring leading/trailing spaces and converting extra spaces, tabs and newlines into a single space. When Selenium reads text out of the page, it attempts to duplicate this behavior, so you can ignore all the tabs and newlines in your HTML and do assertions based on how the text looks in the browser when rendered. We do this by replacing all non-visible whitespace (including the non-breaking space "&nbsp;") with a single space. All visible newlines (<br>, <p>, and <pre> formatted new lines) should be preserved.

We use the same normalization logic on the text of HTML Selenese test case tables. This has a number of advantages. First, you don't need to look at the HTML source of the page to figure out what your assertions should be; "&nbsp;" symbols are invisible to the end user, and so you shouldn't have to worry about them when writing Selenese tests. (You don't need to put "&nbsp;" markers in your test case to assertText on a field that contains "&nbsp;".) You may also put extra newlines and spaces in your Selenese <td> tags; since we use the same normalization logic on the test case as we do on the text, we can ensure that assertions and the extracted text will match exactly.

This creates a bit of a problem on those rare occasions when you really want/need to insert extra whitespace in your test case. For example, you may need to type text in a field like this: "foo ". But if you simply write <td>foo </td> in your Selenese test case, we'll replace your extra spaces with just one space.

This problem has a simple workaround. We've defined a variable in Selenese, ${space}, whose value is a single space. You can use ${space} to insert a space that won't be automatically trimmed, like this: <td>foo${space}${space}${space}</td>. We've also included a variable ${nbsp}, that you can use to insert a non-breaking space.

Note that XPaths do not normalize whitespace the way we do. If you need to write an XPath like //div[text()="hello world"] but the HTML of the link is really "hello&nbsp;world", you'll need to insert a real "&nbsp;" into your Selenese test case to get it to match, like this: //div[text()="hello${nbsp}world"].

Illdefined answered 29/10, 2008 at 18:34 Comment(2)
OpenQA link is no more loading successfullyWickerwork
I just want to note that ${nbsp} is not working for me in Selenium or Chrome dev tools, neither is \u00a0. What worked for me was typing a non-breaking space, on mac Alt+Shift+Space. Web search says Alt+0160 on windows.Jemadar
H
33

I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.

Haircut answered 29/10, 2008 at 16:9 Comment(6)
Using Xpather 1.4.1 in Firefox 2, //td[text()=' '] yields no results.Dali
Sorry. It doesn't work for me. My end goal is to use it in Selenium for the tests of my Web interfaces. Selenium itself keeps the test expressions in a XML structure and the Alt Windows typing seems to be lost in the way. Also, my &#160; returns as a in XML.Illdefined
Zack, as I wrote, you have to replace the space between the two quotes by the character produced by Alt+0160 (on numerical keypad).Haircut
Got to work this with PHP successfully as well: $col = $xpath->query("//p[text()=\"\xC2\xA0\"]");Nitre
@Bergory This works using Protractor with Selenium driverMuntin
Pluss one, I was using xpath->query() to match Known&#160 in php and nothing worked. Finally I stumbled upon your post and it worked!!Outweigh
P
7

As per the HTML you have provided:

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

To locate the node with the string &nbsp; you can use either of the following based solutions:

  • Using text():

    "//td[text()='\u00A0']"
    
  • Using contains():

    "//td[contains(., '\u00A0')]"
    

However, ideally you may like to avoid the NO-BREAK SPACE character and use either of the following Locator Strategies:

  • Using the parent <tr> node and following-sibling:

    "//tr//following-sibling::td[2]"
    
  • Using starts-with():

    "//tr//td[last()]"
    
  • Using the preceeding <td> node and followingnode andfollowing-sibling`:

    "//td[text()='abc']//following::td[1]"
    

Reference

You can find a relevant detailed discussion in:


tl; dr

Unicode Character 'NO-BREAK SPACE' (U+00A0)

Parting answered 11/1, 2020 at 22:5 Comment(0)
U
5

Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

(Note: I did not try this in XPather, but I did try it in Oxygen.)

Unit answered 29/10, 2008 at 15:57 Comment(1)
Thank you so much, that link was very useful. You have no idea how many hours I've spent on trying to figure out a solution. In my case I had to replace whitespaces " " with "\u00A0", as that's how Java interprets that "&nbsp;" symbol.Flexure
P
1

Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&amp;, &gt;, &lt;, &apos;, &quot;) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter &#160; in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.

Parsaye answered 29/10, 2008 at 19:27 Comment(1)
Not if you try/use XPath in XPather (GUI) or in JavaScript (no auto-substitution of entities, since we are not in XML). Good advice in other XML environments (XSTL?).Haircut
S
0

Search for &nbsp; or only nbsp - did you try this?

Soudan answered 29/10, 2008 at 15:5 Comment(2)
I recognize that this should work but it's not exactly sure of what I find. There must be a way in XPATH to encode a certain way to match what I'm looking for.Illdefined
Maybe I should look toward a regular expression.Illdefined
D
0

I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

However, I did have to declare nbsp as an entity within my XML and XSL using the following:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.

Edit: My code sample actually contains the characters '&nbsp;' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!

Dali answered 29/10, 2008 at 17:12 Comment(1)
You can edit your code sample like it was done for the sample in my question. Replace your nbsp entity by &amp;nbsp;.Illdefined
K
0

I wrote a function which generates all the possible permutations and makes one xpath query from them.

def replace_spaces(
    self, xpath_start: str, string: str, xpath_end: str, replacement_str: str
):
    """
    replaces spaces in "string" with "replacement_str" in every permutation
    then concetanates like "(string_permutation) | (...) | ..."
    """
    number_of_spaces = string.count(" ")
    iteration_count = 2**number_of_spaces
    res = ""

    indexes_in_string = []

    index = string.find(" ")
    while index != -1:
        indexes_in_string.append(index)
        index = string.find(" ", index + 1)

    for i in range(iteration_count):
        current_str = string
        binary_representation = bin(i)[2:].rjust(number_of_spaces, "0")
        # print(binary_representation)
        replace_positions_by_bin = []
        for j in range(number_of_spaces):
            if binary_representation[j] == "1":
                replace_positions_by_bin.append(j)
        for replace_position_by_bin in replace_positions_by_bin[::-1]:
            replce_index = indexes_in_string[replace_position_by_bin]
            current_str = (
                current_str[:replce_index]
                + replacement_str
                + current_str[replce_index + 1 :]
            )
        # print(current_str)

        res += "(" + xpath_start + current_str + xpath_end + ")"
        if i != iteration_count - 1:
            res += " | "
    return res

usage:

self.replace_spaces("//*[contains(text(), '", title, "')]", "\u00A0")

Kemp answered 22/3 at 12:34 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Gamboa

© 2022 - 2024 — McMap. All rights reserved.