Wanted: Command line HTML5 beautifier [closed]
Asked Answered
G

4

76

Wanted

A command line HTML5 beautifier running under Linux.

Input

Garbled, ugly HTML5 code. Possibly the result of multiple templates. You don't love it, it doesn't love you.

Output

Pure beauty. The code is nicely indented, has enough line breaks, cares for it's whitespace. Rather than viewing it in a webbrowser, you would like to display the code on your website directly.

Suspects

  • tidy does too much (heck, it alters my doctype!), and it doesn't work well with HTML5. Maybe there is a way to make it cooperate and not alter anything?
  • vim does too little. It only indents. I want the program to add and remove line breaks, and to play with the whitespace inside of tags.

DEAD OR ALIVE!

Gate answered 17/4, 2010 at 7:41 Comment(5)
Shouldn't this be a superuser question?Frydman
I'd say you have the right site for this. Not sure how many people on SU actually use HTML, much less HTML5.Approve
I had the same problem and ended up to write a new Ruby library that doesn't require compiling of any third party utils (I had problems to get Tidy working with Rails) and focuses just on HTML5, not XML, XHTML or HTML 4. It's not perfect yet, but has worked well in all projects I have used it. Please take a look at jarijokinen.com/html5-beautifierSociology
use XHTML5 and you can do xmllint --formatSyllabus
you can also monkeypatch HTML5 polyglot documents: echo '<!doctype html>'; (echo "<?xml version='1.0' ?>"; tail -n +2 < index.html) | xmllint --format - | sed -re 's/(<script[^>]*)\/>/\1><\/script>/g' | tail -n+2. this should work with input documents with doctype on line 1 but no xml-prolog. outputs in the same style.Syllabus
S
27

HTML Tidy has been forked by the w3c and now has support for HTML5 validation.

https://github.com/w3c/tidy-html5

Spacious answered 30/11, 2011 at 1:25 Comment(4)
As of July 2014 this project appears to have been stalled for two yearsInevitable
As of April 2015 it appears to be alive. Although you still need to build binaries from source yourself by pulling the git repo.Saltish
As of June 2016, you can install using Homebrew on OSX.Ranchero
As of July 2017, you can apt-get install tidy on DebianHypogene
D
19

I suspect tidy can be made to work with the right command-line parameters.

http://tidy.sourceforge.net/docs/quickref.html

You can specify an arbitrary doctype and add new block, inline, and empty tags, and turn on and off lots of tidy's cleaning options.

Depending on what you want it to "beautify" you can probably get decent results. It probably won't be able to do some of the more advanced things like rewriting the html content to eliminate spurious elements or combining them, if it doesn't recognize them.

Delafuente answered 5/5, 2010 at 18:1 Comment(5)
At a rough guess, how about tidy -as-xhtml --input-xml --tidy-mark no -indent --indent-spaces 4 -wrap 0 --new-blocklevel-tags article,header,footer --new-inline-tags video,audio,canvas,ruby,rt,rp --doctype "<!DOCTYPE HTML>" --break-before-br yes --sort-attributes alpha --vertical-space yes (disclaimer - I've not used html5, and I've only copied a few new tags from w3schools.com/html5/html5_reference.asp into the list by guessing which were block/inline, so please adjust as appropriate.)Totemism
This seems to be the best option. Kudos to Stobor, too!Gate
This is a good start, but it needs so much more. E.g. new input element attributes / values (type="date").Musclebound
i had trouble with 2 of the options here. --doctype "<!DOCTYPE HTML>" and --sort-attributes alpha would not work for some reasonSteverson
I also struggled to get tidy working. My resulting options on ubuntu 14.10 were: tidy --tidy-mark no -indent --indent-spaces 4 -wrap 0 --new-blocklevel-tags 'article,header,footer' --new-inline-tags 'video,audio,canvas,ruby,rt,rp' --break-before-br yes --sort-attributes alpha --vertical-space yesAbortifacient
P
9

Copied from a live website I did using HTML5 that is validated as proper HTML5 on all pages thanks to this snippet (PHP in this case but the options and logic is the same for any language used):

    $options = array(
        'hide-comments' => true,
        'tidy-mark' => false,
        'indent' => true,
        'indent-spaces' => 4,
        'new-blocklevel-tags' => 'article,header,footer,section,nav',
        'new-inline-tags' => 'video,audio,canvas,ruby,rt,rp',
        'new-empty-tags' => 'source',
        'doctype' => '<!DOCTYPE HTML>',
        'sort-attributes' => 'alpha',
        'vertical-space' => false,
        'output-xhtml' => true,
        'wrap' => 180,
        'wrap-attributes' => false,
        'break-before-br' => false,
    );

    $buffer = tidy_parse_string($buffer, $options, 'utf8');
    tidy_clean_repair($buffer);
    // Fix a tidy doctype bug
    $buffer = str_replace('<html lang="en" xmlns="http://www.w3.org/1999/xhtml">', '<!DOCTYPE HTML>', $buffer);
Pepita answered 26/6, 2011 at 14:26 Comment(0)
L
2

If you use Haml as your nanoc-filter, your html will automatically be pretty-printed. You can set html5 output as an option.

Limn answered 26/4, 2010 at 13:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.