Your desired output is nonsense:
['Raw name 1', 2,094, 0,017, 0,098, 0,113, 0,452]
# ~> -:1: Invalid octal digit
# ~> ['Raw name 1', 2,094, 0,017, 0,098, 0,113, 0,452]
I'll assume you want quoted numbers.
After stripping the stuff that keeps the code from working, and reducing the HTML to a more manageable example, then running it:
require 'nokogiri'
html = <<EOT
<table class="open">
<tr>
<th>Table name</th>
<th>Column name 1</th>
<th>Column name 2</th>
</tr>
<tr>
<th>Raw name 1</th>
<td>2,094</td>
<td>0,017</td>
</tr>
<tr>
<th>Raw name 5</th>
<td>2,094</td>
<td>0,017</td>
</tr>
</table>
EOT
doc = Nokogiri::HTML(html)
tables = doc.css('table.open')
tables_data = []
tables.each do |table|
title = table.css('tr[1] > th').text # !> assigned but unused variable - title
cell_data = table.css('tr > td').text
raw_name = table.css('tr > th').text
tables_data << [cell_data, raw_name]
end
Which results in:
tables_data
# => [["2,0940,0172,0940,017",
# "Table nameColumn name 1Column name 2Raw name 1Raw name 5"]]
The first thing to notice is you're not using title
though you assign to it. Possibly that happened when you were cleaning up your code as an example.
css
, like search
and xpath
, returns a NodeSet, which is akin to an array of Nodes. When you use text
or inner_text
on a NodeSet it returns the text of each node concatenated into a single string:
Get the inner text of all contained Node objects.
This is its behavior:
require 'nokogiri'
doc = Nokogiri::HTML('<html><body><p>foo</p><p>bar</p></body></html>')
doc.css('p').text # => "foobar"
Instead, you should iterate over each node found, and extract its text individually. This is covered many times here on SO:
doc.css('p').map{ |node| node.text } # => ["foo", "bar"]
That can be reduced to:
doc.css('p').map(&:text) # => ["foo", "bar"]
See "How to avoid joining all text from Nodes when scraping" also.
The docs say this about content
, text
and inner_text
when used with a Node:
Returns the content for this Node.
Instead, you need to go after the individual node's text:
require 'nokogiri'
html = <<EOT
<table class="open">
<tr>
<th>Table name</th>
<th>Column name 1</th>
<th>Column name 2</th>
<th>Column name 3</th>
<th>Column name 4</th>
<th>Column name 5</th>
</tr>
<tr>
<th>Raw name 1</th>
<td>2,094</td>
<td>0,017</td>
<td>0,098</td>
<td>0,113</td>
<td>0,452</td>
</tr>
<tr>
<th>Raw name 5</th>
<td>2,094</td>
<td>0,017</td>
<td>0,098</td>
<td>0,113</td>
<td>0,452</td>
</tr>
</table>
EOT
tables_data = []
doc = Nokogiri::HTML(html)
doc.css('table.open').each do |table|
# find all rows in the current table, then iterate over the second all the way to the final one...
table.css('tr')[1..-1].each do |tr|
# collect the cell data and raw names from the remaining rows' cells...
raw_name = tr.at('th').text
cell_data = tr.css('td').map(&:text)
# aggregate it...
tables_data += [raw_name, cell_data]
end
end
Which now results in:
tables_data
# => ["Raw name 1",
# ["2,094", "0,017", "0,098", "0,113", "0,452"],
# "Raw name 5",
# ["2,094", "0,017", "0,098", "0,113", "0,452"]]
You can figure out how to coerce the quoted numbers into decimals acceptable to Ruby, or manipulate the inner arrays however you want.
Table
?. Why do you haverender template
and the two terminatingend
s? We have to remove that stuff to test. Show a minimal sample of the data you're returning minus the use of the custom class. – Brig