Tools to reduce generated HTML size
Asked Answered
O

4

0

I'm using google docs, and some templates we are using were created using MS-Office.
The resulting HTML is fat and ugly, and the 500KB per doc limitation on google makes some cleanup mandatory. I was able to find redundant "style" attributes and move them to some CSS class, and rename the most redundant classes names to shorter ones, which makes me save about 50% of the original size.
Are you aware of some existing tools/scripts/lib which could do this painful job for me, or at least help me to write this magic tool ?

Thanks in advance !

EDIT: I gave a try to both tidy, demoronizer and "manual rewrite":
- Input : 140Kb
- Tidy'ed : 110Kb
- Demoronized : 135Kb

So my favorite answer will be "rewrite it!"

Thanks !

Oakland answered 19/1, 2009 at 16:40 Comment(0)
R
4

MS-Office makes crappy HTML, period. You're better of spending time rebuilding the HTML from the original text than trying to walk through that minefield.

I made a few macros that do some search/replace functions on Word to do basic things like wrap <p> tags around paragraphs and stuff like that, then re-markup the whole thing from scratch.

Rothmuller answered 19/1, 2009 at 16:49 Comment(0)
S
3

You could try tidy it will clean up many things.

Suffragette answered 19/1, 2009 at 16:45 Comment(0)
F
0

Without commenting on its name, I could mention demoronizer, which the author describes as:

...a Perl program available for downloading from this site which corrects numerous errors and incompatibilities in HTML generated by, or edited with, Microsoft applications.

YMMV.

Frig answered 20/1, 2009 at 0:53 Comment(0)
R
0

One of my favourite utilties now is actually Windows Live Writer - it does a neat job of stripping rubbish out of Word doc files. Some might disagree but I use it quite often!

Rollo answered 8/9, 2009 at 18:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.