Get element in particular index nokogiri
Asked Answered
R

2

5

How can I get the element at index 2.

For example in following HTML I want to display the third element i.e a DIV:

<HTMl>
    <DIV></DIV>
    <OL></OL>
    <DIV> </DIV>
</HTML>

I have been trying the following:

p1 =  html_doc.css('body:nth-child(2)')
puts p1
Remonstrate answered 6/9, 2014 at 4:51 Comment(2)
When you ask a question, it's important to supply the actual code you use, reduced down until it duplicates the problem, with nothing more. Nokogiri can be told to parse a document multiple ways, and will result in different versions of the DOM, which would affect how your sample code works. Without that added code, we can't tell where you're doing the wrong thing, but body:nth-child(2) is a clue.Devlin
I used doc.xpath('//html/div') this worked for meRemonstrate
D
6

I don't think you're understanding how we use a parser like Nokogiri, because it's a lot easier than you make it out to be.

I'd use:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<HTMl>
    <DIV>1</DIV>
    <OL></OL>
    <DIV>2</DIV>
</HTML>
EOT

doc.at('//div[2]').to_html # => "<div>2</div>"

That's using at which returns the first Node that matches the selector. //div[2] is an XPath selector that will return the second <div> found. search could be used instead of at, but it returns a NodeSet, which is like an array, and would mean I'd need to extract that particular node.

Alternately, I could use CSS instead of XPath:

doc.search('div:nth-child(3)').to_html # => "<div>2</div>"

Which, to me, is not really an improvement over the XPath as far as readability.

Using search to find all occurrences of a particular tag, means I have to select the particular element from the returned NodeSet:

doc.search('div')[1].to_html # => "<div>2</div>"

Or:

doc.search('div').last.to_html # => "<div>2</div>"

The downside to using search this way, is it will be slower and needlessly memory intensive on big documents since search finds all occurrences of the nodes that match the selector in the document, and which are then thrown away after selecting only one. search, css and xpath all behave that way, so, if you only need the first matching node, use at or its at_css and at_xpath equivalents and provide a sufficiently definitive selector to find just the tag you want.

'body:nth-child(2)' doesn't work because you're not using it right, according to ":nth-child()" and how I understand it works. nth-child looks at the tag supplied, and finds the "nth" occurrence of it under its parent. So, you're asking for the third tag under body's "html" parent, which doesn't exist because a correctly formed HTML document would be:

<html>
  <head></head>
  <body></body
</html>

(How you tell Nokogiri to parse the document determines how the resulting DOM is structured.)

Instead, use: div:nth-child(3) which says, "find the third child of the parent of div, which is "body", and results in the second div tag.

Back to how Nokogiri can be told to parse a document; Meditate on the difference between these:

doc = Nokogiri::HTML(<<EOT)
<p>foo</p>
EOT

puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>foo</p>
# >> </body></html>

and:

require 'nokogiri'

doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<p>foo</p>
EOT

puts doc.to_html
# >> <p>foo</p>
Devlin answered 7/9, 2014 at 20:40 Comment(0)
C
4

If you can modify the HTML add id's and classes to target easily what you are looking for (also add the body tag).

If you can not modify the HTML keep your selector simple and access the second element of the array.

html_doc.css('div')[1]
Commix answered 6/9, 2014 at 6:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.