A command-line HTML pretty-printer: Making messy HTML readable [closed]
Asked Answered
H

5

155

I'm looking for recommendations for HTML pretty printers which fulfill the following requirements:

  • Takes HTML as input, and then output a nicely formatted/correctly indented but "graphically equivalent" version of the given input HTML.
  • Must support command-line operation.
  • Must be open-source and run under Linux.
Holcombe answered 3/2, 2010 at 13:0 Comment(6)
Other options are pup (without arguments), xmllint --format --html -, and xml fo --html.Bourbon
curl httpbin.org | tidy -imLeporine
Also: hxnormalize from html-xml-utils (Debian)Platinum
related: #16091369 you can also look into XML ToolsDissociation
I do have problems to get why this is considered off-topic, honestly...Endocrinotherapy
Hey @VictorSchröder I think this is one of the few closed questions that are closed because of genuine non-compliance with the topic rules. I don't agree with the topic rules. I think recommendations of tools should be a core component, as none of want to "reinvent the wheel", at least not every day, for everything, so recommendations of good tooling is core to busines, but given they have that clause, this post doesn't comply. There are plenty of questions that are closed for far more ridiculous interpretations, unfortunately. It does seem to be a disease amongst SO admins. :-(Asbury
F
126

Have a look at the HTML Tidy Project: http://www.html-tidy.org/

The granddaddy of HTML tools, with support for modern standards.

There used to be a fork called tidy-html5 which since became the official thing. Here is its GitHub repository.

Tidy is a console application for Mac OS X, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.

For your needs, here is the command line to call Tidy:

tidy inputfile.html
Flowerage answered 3/2, 2010 at 13:8 Comment(8)
Thanks! "tidy -i -m -w 160 -ashtml -utf8 index.html" did the trick! Turns out tidy is installed by default in MacOS X - excellent!Holcombe
Tidy was struggling with getting the indentation until I ran it with this option (rather than letting it default to "auto" with -i: tidy --indent yesArhat
Tidy is great as a validator/lint tool, but it's not so great as a code beautifier. Two issues: (1) it can only operate on files, not standard input (so you cannot, for example, send selected text from Notepad++ to tidy.exe, and have it output the formatted code back to Notepad++); (2) It has trouble formatting a lot of code, e.g.: <form><input><input><input><input><input></form>.Amar
Also it modifies the file when it cannot understand text.Circumvent
One note about tidy-html5, if you're using inline javascript, you need to include type="text/javascript" otherwise tidy will add <![CDATA[Achelous
tidy index.html -qi -utf8 --output index.html just a command done all things.Topography
Tidy does more than just format the HTML. It will remove empty tags and reorder technically invalid HTML that is accept by browsers (read: is used on the internet). <p class="a"><div class="b"></div></p> gets reordered as <p class="a"></p><div class="b"></div> and something like <p><div></div></p> just gets deleted. See this GitHub issue. If you use tidy, you should run it in quiet mode tidy -q and don't ignore any warnings like trimming empty <p>. Don't use it on HTML you didn't write.Allover
@Amar my version of tidy on Linux uses stdin, stdout and stderr if these are not specified in the options. I presume you are limited by your OS.Asbury
C
14

Update 2018: The homebrew/dupes is now deprecated, tidy-html5 may be directly installed.

brew install tidy-html5

Original reply:

Tidy from OS X doesn't support HTML5. But there is experimental branch on Github which does.

To get it:

brew tap homebrew/dupes
brew install tidy --HEAD
brew untap homebrew/dupes

That's it! Have fun!

Classmate answered 1/7, 2013 at 12:32 Comment(3)
Error: No available formula with the name "tidy". brew install tidy-html5 works.Jacks
Indeed brew install tidy-html5 works and you don't neeed the homebrew/dupes tap either.Seychelles
Tidy does more than just format the HTML. It will remove empty tags and reorder technically invalid HTML that is accept by browsers (read: is used on the internet). <p class="a"><div class="b"></div></p> gets reordered as <p class="a"></p><div class="b"></div> and something like <p><div></div></p> just gets deleted. See this GitHub issue. If you use tidy, you should run it in quiet mode tidy -q and don't ignore any warnings like trimming empty <p>. Don't use it on HTML you didn't write.Allover
K
5

I think HTML tidy is one of the household names in that field.

Kinglet answered 3/2, 2010 at 13:5 Comment(0)
C
5

To have an updated, OS-agnostic answer to this question:

While the original HTMLTidy project has been dormant for over 6 years, a "W3C Community & Business group" that goes by the name "HTML Tidy Advocacy Community Group (HTACG)" has now begun to continue its development, with the goal of making it fully HTML5-compatible. The group was formed in January 2015 and although they describe the current state as "work in progress", binaries are already available for download.

Consist answered 6/8, 2015 at 13:21 Comment(0)
C
2

Just a late followup on an OT question.

Homebrew has a tidy-html5 installed as you'd expect.

It's linked up as tidy5.

Cheryllches answered 9/6, 2015 at 14:10 Comment(2)
Tidy still mostly as HTML formatter & validator, not HTML parser. Which tool can be used for HTML parsing based on rules: search the code for target elements (tags) with specified 'class' or 'id', and delete them, along with content (child tags)? Plus delete specified tags.Magnetohydrodynamics
@triwo If you have a new question, particularly when not related to the original question, post a new question :) The caveat is that requests for tools/libraries/etc. are generally considered off-topic. In general, any HTML parser w/ XPath or CSS selector queries should be able to manipulate a DOM in arbitrary ways.Cheryllches

© 2022 - 2024 — McMap. All rights reserved.