HTML tidy/cleaning in Ruby 1.9
Asked Answered
E

4

8

I'm currently using the RubyTidy Ruby bindings for HTML tidy to make sure HTML I receive is well-formed. Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9. Are there any alternative libraries out there that will tidy up chunks of HTML on Ruby 1.9?

Eschew answered 20/8, 2009 at 20:46 Comment(0)
W
7

http://github.com/libc/tidy_ffi/blob/master/README.rdoc works with ruby 1.9 (latest version)

If you are working on windows, you need to set the library_path eg

    require 'tidy_ffi'
    TidyFFI.library_path = 'lib\\tidy\\bin\\tidy.dll'
    tidy = TidyFFI::Tidy.new('test')
    puts tidy.clean

(It uses the same dll as tidy) The above links gives you more example of the usage.

Wrongheaded answered 21/4, 2010 at 18:36 Comment(0)
J
7

I am using Nokogiri to fix invalid html:

  Nokogiri::HTML::DocumentFragment.parse(html).to_html
Jabalpur answered 29/11, 2010 at 8:42 Comment(3)
I don't think this tidies the HTML.Amaze
Is it reliable ? I mean, does it fix syntax errors like, for example, nested lists in paragraphs ?Lascar
Nokogiri only ensures the html is well-formed, but it won't fix syntax errors. For example, <table>x<table>y is "fixed" into <table>x<table>y</table></table>Molest
R
3

Here is a nice example of how to make your html look better using tidy:

require 'tidy'
Tidy.path = '/opt/local/lib/libtidy.dylib' # or where ever your tidylib resides

nice_html = ""
Tidy.open(:show_warnings=>true) do |tidy|
  tidy.options.output_xhtml = true
  tidy.options.wrap = 0
  tidy.options.indent = 'auto'
  tidy.options.indent_attributes = false
  tidy.options.indent_spaces = 4
  tidy.options.vertical_space = false
  tidy.options.char_encoding = 'utf8'
  nice_html = tidy.clean(my_nasty_html_string)
end

# remove excess newlines
nice_html = nice_html.strip.gsub(/\n+/, "\n")
puts nice_html

For more tidy options, check out the man page.

Rundlet answered 16/4, 2010 at 8:8 Comment(1)
As of now it appears the tidy gem is incompatible with Ruby 1.9. There appears to be a fork at github.com/ShogunPanda/tidy but I haven't investigated it.Amaze
D
1

Currently this library is the only thing holding me back from getting a Rails application on Ruby 1.9.

Watch out, the Ruby Tidy bindings have some nasty memory leaks. It's currently unusable in long running processes. (for the record, I'm using http://github.com/ak47/tidy)

I just had to remove it from a production Rails 2.3 application because it was leaking about 1MB/min.

Dayton answered 11/3, 2010 at 9:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.