How do I get the value inside a <td> tag with xpath/htmlwebunit
Asked Answered
M

2

6

I am trying to create a Java Application that retrieves information from a webpage. This is part of the code I am trying to access the value in the 1st td tag in the 2nd tr tag:

<TABLE  CLASS="datadisplaytable" width = "100%">
<TR>
    <TD CLASS="dddead">&nbsp;</TD>
    <TH CLASS="ddheader" scope="col" ><SPAN class="fieldlabeltext">Capacity</SPAN></TH>
    <TH CLASS="ddheader" scope="col" ><SPAN class="fieldlabeltext">Actual</SPAN></TH>
    <TH CLASS="ddheader" scope="col" ><SPAN class="fieldlabeltext">Remaining</SPAN></TH>
</TR> 
<TR>
    <TH CLASS="ddlabel" scope="row" ><SPAN class="fieldlabeltext">Seats</SPAN></TH>
    **<TD CLASS="dddefault">46</TD>**
    <TD CLASS="dddefault">46</TD>
    <TD CLASS="dddefault">0</TD>
</TR>

This is what i have right now but this only returns the class of the td tag and not the value inside it:

List<?> table = page.getByXPath("//table[@class='datadisplaytable'][1]//tr[2]/td");

How would I go about getting the value of the td tag and not its properties?

edit: The code above returns this:

HtmlTableDataCell[<td class="dddefault">]
Metro answered 28/2, 2012 at 18:53 Comment(3)
i need to get the value inside the td tag, in this case it would be '46'Metro
It's been a while since I last used Java, but there should be a method called text(), or something similarHistoriography
Hmm, I looked into what you said and found this ((HtmlTableDataCell) table.get(0)).getTextContent() It seems to be working, thanks for the help!Metro
G
10

I am trying to create a Java Application that retrieves information from a webpage. This is part of the code I am trying to access the value in the 1st td tag in the 2nd tr tag:

Assuming that the document is as shown in the question (TABLE is the top element),

Use:

/TABLE/TR[2]/TD[1]/text()

This selects any text-node child of the first TD child of the second TR child of the top element TABLE.

In case the table is buried in the XML document, but can be uniquely identified by its CLASS attribute, use:

//TABLE[@CLASS='datadisplaytable']/TR[2]/TD[1]/text()

This selects any text-node child of the first TD child of the second TR child of any (we know thre is only one such) element TABLE in the XML document, such that the string value of its CLASS attribute is the string 'datadisplaytable'.

Finally, if even worse, there could be many TABLE elements whose CLASS attribute's value is 'datadisplaytable', and we want to select in the first such table, use:

(//TABLE[@CLASS='datadisplaytable'])[1]/TR[2]/TD[1]/text()
Gove answered 28/2, 2012 at 19:17 Comment(2)
This helps a lot in understanding the details of xpath. I did not know it was possible to just do text(). This might be better than casting and using .getTextContent(). Thanks for the help!Metro
@Saad: You can get directly the string value by using the standard XPath function string(). So, string(expressionSelectingAnElement) returns the concatenation of all the text-node descendents of the element.Gove
T
1
for getting the text content from an element there is an xpath function called "text()" which you can use.

Element containing text 't' exactly         //*[.='t']  
Element <E> containing text 't'             //*[.='t']  
<a> containing text 't'                     //a[contains(text(),'t')]
<a> with target link 'url'                  //a[@href='url']
Link URL labeled with text 't' exactly      //a[.='t']/@href

If you are also using JwebUnit, there is a method "getElementTextByXPath" which can also be used to get the text. net.sourceforge.jwebunit.junit.WebTestCase

getElementTextByXPath

public String getElementTextByXPath(String xpath) Deprecated. Get text of the given element. Parameters: xpath - xpath of the element.

    for (int i = 1; i != 6; i++) {

        String result = getElementTextByXPath("//td["+i+"][text()]");

        System.out.println("The Content of TD is " +result);
    }
Transfigure answered 2/4, 2012 at 9:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.