Simple HTML Pretty Print
Asked Answered
U

3

10

http://jsfiddle.net/JamesKyle/L4b8b/

This may be a futile effort, but I personally think its possible.

I'm not the best at Javascript or jQuery, however I think I have found a simple way of making a simple prettyprint for html.

There are four types of code in this prettyprint:

  1. Plain Text
  2. Elements
  3. Attributes
  4. Values

In order to stylize this I want to wrap elements, attibutes and values with spans with their own classes.


The first way I have of doing this is to store every single kind of element and attribute (shown below) and then wrapping them with the corresponding spans

$(document).ready(function() {

    $('pre.prettyprint.html').each(function() {

        $(this).css('white-space','pre-line');

        var code = $(this).html();

        var html-element = $(code).find('a, abbr, acronym, address, area, article, aside, audio, b, base, bdo, bdi, big, blockquote, body, br, button, canvas, caption, cite, code, col, colgroup, command, datalist, dd, del, details, dfn, div, dl, dt, em, embed, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, head, header, hgroup, hr, html, i, img, input, ins, kbd, keygen, label, legend, li, link, map, mark, meta, meter, nav, noscript, object, ol, optgroup, option, output, p, param, pre, progress, q, rp, rt, ruby, samp, script, section, select, small, source, span, strong, summary, style, sub, sup, table, tbody, td, textarea, tfoot, th, thead, title, time, tr, track, tt, ul, var, video, wbr');

        var html-attribute = $(code).find('abbr, accept-charset, accept, accesskey, actionm, align, alink, alt, archive, axis, background, bgcolor, border, cellpadding, cellspacing, char, charoff, charset, checked, cite, class, classid, clear, code, codebase, codetype, color, cols, colspan, compact, content, coords, data, datetime, declare, defer, dir, disabled, enctype, face, for, frame, frameborder, headers, height, href, hreflang, hspace, http-equiv, id, ismap, label, lang, language, link, longdesc, marginheight, marginwidth, maxlength, media, method, multiple, name, nohref, noresize, noshade, nowrap, object, onblur, onchange,onclick ondblclick onfocus onkeydown, onkeypress, onkeyup, onload, onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onreset, onselect, onsubmit, onunload, profile, prompt, readonly, rel, rev, rows, rowspan, rules, scheme, scope, scrolling, selected, shape, size, span, src, standby, start, style, summary, tabindex, target, text, title, type, usemap, valign, value, valuetype, version, vlink, vspace, width');

        var html-value = $(code).find(/* Any instance of text inbetween two parenthesis */);

        $(element).wrap('<span class="element" />');
        $(attribute).wrap('<span class="attribute" />');
        $(value).wrap('<span class="value" />');

        $(code).find('<').replaceWith('&lt');
        $(code).find('>').replaceWith('&gt');
    });
});

The second way I thought of was to detect elements as any amount of text surrounded by two < >'s, then detect attributes as text inside of an element that is either surrounded by two spaces or has an = immediately after it.

$(document).ready(function() {

    $('pre.prettyprint.html').each(function() {

        $(this).css('white-space','pre-line');

        var code = $(this).html();

        var html-element = $(code).find(/* Any instance of text inbeween two < > */);

        var html-attribute = $(code).find(/* Any instance of text inside an element that has a = immeadiatly afterwards or has spaces on either side */);

        var html-value = $(code).find(/* Any instance of text inbetween two parenthesis */);

        $(element).wrap('<span class="element" />');
        $(attribute).wrap('<span class="attribute" />');
        $(value).wrap('<span class="value" />');

        $(code).find('<').replaceWith('&lt');
        $(code).find('>').replaceWith('&gt');
    });
});

How would either of these be coded, if at all possible

Again you can see this as a jsfiddle here: http://jsfiddle.net/JamesKyle/L4b8b/

Upper answered 1/12, 2011 at 21:28 Comment(2)
Why not use one of the several server-side templating engines already out there?Bonnibelle
because I have absolutely no idea how those work, I'm a designer not a developer. I just thought this would be a relatively easy thing to doUpper
E
25

Don't be so sure you have gotten all there is to pretty-printing HTML in so few lines. It took me a little more than a year and 2000 lines to really nail this topic. You can just use my code directly or refactor it to fit your needs:

https://github.com/prettydiff/prettydiff/blob/master/lib/markuppretty.js (and Github project)

You can demo it at http://prettydiff.com/?m=beautify&html

The reason why it takes so much code is that people really don't seem to understand or value the importance of text nodes. If you are adding new and empty text nodes during beautification then you are doing it wrong and are likely corrupting your content. Additionally, it is also really ease to screw it up the other way and remove white space from inside your content. You have to be careful about these or you will completely destroy the integrity of your document.

Also, what if your document contains CSS or JavaScript. Those should be pretty printed as well, but have very different requirements from HTML. Even HTML and XML have different requirements. Please take my word for it that this is not a simple thing to figure out. HTML Tidy has been at this for more than a decade and still screws up a lot of edge cases.

As far as I know my markup_beauty.js application is the most complete pretty-printer ever written for HTML/XML. I know that is a very bold statement, and perhaps arrogant, but so far its never been challenged. Look my code and if there is something you need that it is not doing please let me know and I will get around to adding it in.

Eisen answered 5/12, 2011 at 22:13 Comment(4)
Do you know of any good reasources of how to use this, I am designing a site for an easy-to-understand guide to HTML and CSS (sort of like w3schools only valid WC3 recommendation). Eventually guides to Javascript/jQuery, PHP, and a few others. However the company wants a mockup for this very quickly so I just need to show them what I've got. Any help is greatly appreciated!Upper
The markup_beauty application takes a single argument, referred to in the application as "arg". This argument is an object literal with the properties specified in the "Options" section of the comment at the top. This means you will need to write some code to accept input and package that input into the proper format. Once your input is packaged you just run: var pretty_code = markup_beauty(your_input_object); The application returns two things. Primarily the application only returns the beautified code, which can be assigned to a variable like the code example in this comment. The other is...Eisen
The second output from the application is supplied to the "summary" variable. That variable is not scoped to the application, because it is meant to be used in closure from a higher scope. If you do not wish to use this you can speed up processing speed noticeably by simply deleting the last function starting around line 1714. If you do wish to use this reporting then you will need to declare a variable named "summary" outside the markup_beauty application. This is necessary to provide a means of externally accessing variables and data private to the application.Eisen
I will update the comment in the code to show a coded example use case in the next update.Eisen
S
-1

If you're doing this client-side, and you already have the DOM, then it would be more efficient to serialise it yourself inserting the appropriate tags as you go rather than serialising the whole subtree at once and then trying to reparse it.

Stendhal answered 1/12, 2011 at 21:45 Comment(0)
S
-1

Personally I would wrap HTML with pre and not try to do any pretty printing. There are TONS of libraries for doing code formatting just google pretty print. Just wrapping HTML with pre will automatically make it 'printed' code.

For JavaScript, you can use JSON.stringify to recreate the code by passing in a number of spaces for nested structures.

JSON.stringify({ name: 'value' }, null, 2); //Change to four, for four spaces
Swore answered 3/12, 2011 at 2:43 Comment(1)
I don't understand how JSON.stringify helps you pretty print HTML.Accent

© 2022 - 2024 — McMap. All rights reserved.