How to Minify HTML code?
Asked Answered
A

5

9

My idea is to somehow minify HTML code in server-side, so client receive less bytes.

What do I mean with "minify"?

Not zipping. More like, for example, jQuery creators do with .min.js versions. In other words, I need to remove unnecessary white-spaces and new-lines, but I can't remove so much that presentation of HTML changes (for example remove white-space between actual words in paragraph).

Is there any tools that can do it? I know there is HtmlPurifier. Is it able to do it? Any other options?

P.S. Please don't offer regex'ies. I know that only Chuck Norris can parse HTML with them. =]

Anemograph answered 28/4, 2011 at 8:25 Comment(3)
I don't think you need to do this. Most web servers support serving web pages "gzipped". Your whitespaces will no longer become an issue. You should always serve your web pages gzipped.Escobar
You can write a simple program that uses an HTML parsing library to parse the HTML file and then write it back out. If you use C#, you can look at the LINQ-to-HTML library.Escobar
Agreeing with Stephen Chung: if you gzip the HTML, all whitespace will be compacted. It'll be a faster process than fixing up the HTML itself.Popper
R
3

You could parse the HTML code into a DOM tree (which should keep content whitespace in the nodes), then serialise it back into HTML, without any prettifying spaces.

Rawinsonde answered 28/4, 2011 at 8:28 Comment(0)
C
11

A bit late but still... By using output_buffering it is as simple as that:

function compress($string)
{
    // Remove html comments
    $string = preg_replace('/<!--.*-->/', '', $string);

    // Merge multiple spaces into one space
    $string = preg_replace('/\s+/', ' ', $string);   

    // Remove space between tags. Skip the following if
    // you want as it will also remove the space 
    // between <span>Hello</span> <span>World</span>.
    return preg_replace('/>\s+</', '><', $string);      
}

ob_start('compress');

// Here goes your html.    

ob_end_flush();
Croat answered 24/8, 2013 at 12:9 Comment(3)
You will probably not want to remove whitespaces in Tags like pre, code etc.Zero
@BijayRungta you're right. Though it is possible to avoid that with some modifications. I've just gave an idea up there :) +1 to your comment.Croat
Parsing HTML with a regexp does not work. Your regexp would break on e.g. <!-- foo --><p>bar</p><!-- baz -->.Polyandry
R
3

You could parse the HTML code into a DOM tree (which should keep content whitespace in the nodes), then serialise it back into HTML, without any prettifying spaces.

Rawinsonde answered 28/4, 2011 at 8:28 Comment(0)
H
3

Is there any tools that can do it?

Yes, here's a tool you could include into a build process or work into a web cache layer: https://code.google.com/archive/p/htmlcompressor/

Or, if you're looking for a tool to minify HTML that you paste in, try: http://www.willpeavy.com/minifier/

Haematite answered 3/5, 2011 at 1:57 Comment(0)
B
0

You can use the Pretty Diff tool: http://prettydiff.com/?m=minify&html It will also minify any CSS and JavaScript in the HTML code, and the minification occurs in a regressive manner so to not prevent future beautification of the HTML back to readable form.

Burgin answered 25/11, 2011 at 11:9 Comment(0)
K
0

Is there any tools that can do it?

You can use CodVerter Online Web Development Editor for compressing mixed html code.
the compressor was tested multiple times for reliability and accuracy.
(Full Disclosure: I am one of the developers).

enter image description here

Killer answered 4/2, 2019 at 20:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.