How to get rid of copy & paste text styling in ajax html editor

Asked 25/5, 2011 at 11:15 Answered 3/6, 2011 at 9:13

I am using ajax html editor for news description page. When I copy paste the stuff from word or internet , it copies the styling of that text , paragraph etc which overcomes the default class style of the html editor textbox, What I want is to get rid of inline style like below but not the html which have
i want to keep that into paragraph

<span id="ContentPlaceHolder1_newsDetaildesc" class="newsDetails"><span style="font-family: arial, helvetica, sans; font-size: 11px; line-height: 14px; color: #000000; "><strong>Lorem Ipsum</strong>&nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.<BR /> It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</span></span></p>

#left_column .newsDetails span[style] { font-family: Arial !important; font-size: small !important; font-weight: normal !important; color: #808080 !important; }

Lighthouse answered 25/5, 2011 at 11:15 Comment(7)

Im sorry, but are you copying the text from word and pasting it into your web browser? – Hungry 25/5, 2011 at 13:9

yeh copying the text from the one of the blog and pasting it to html editor – Lighthouse 25/5, 2011 at 13:29

What you could try is some sort of paste special like in word, but im not sure i fully understand your question – Hungry 25/5, 2011 at 15:7

Are you asking for a way that you, personally, can paste HTML from Word into any editor and get this behaviour? Or are you developing with an editor, and you want your users to be able to paste from Word and get this behavior? – Christelchristen 27/5, 2011 at 21:11

@Muhammad Awais Also, please show us an example of the input AND output you are looking for. – Broomcorn 27/5, 2011 at 21:30

Could you please explain what an AJAX HTML editor is? Have you built a custom editor or are you using a standard one, for example TinyMCE – Danielldaniella 30/5, 2011 at 7:18

asp.net/ajax/ajaxcontroltoolkit/samples/htmleditor/… – Lighthouse 31/5, 2011 at 9:13

First, be aware that the HTML you receive by pasting from Word (or any other HTML source) is going to vary wildly depending on the source. Even different versions of Word will give you radically different input. If you design some code that works perfectly on content from the version of MS Word that you have, it may not work at all for a different version of MS Word.

Also, some sources will paste content that looks like HTML, but is actually garbage. When you paste HTML content into a rich text area in your browser, your browser has nothing to do with how that HTML is generated. Do not expect it to be valid by any stretch of your imagination. In addition, your browser will further munge the HTML as it is inserted into the DOM of your rich text area.

Because the potential inputs vary so much, and because the acceptable outputs are difficult to define, it is hard to design a proper filter for this sort of thing. Further, you cannot control how future versions of MS Word will handle their HTML content, so your code will be difficult to future-proof.

However, take heart! If all the world's problems were easy ones, it would be a pretty boring place. There are some potential solutions. It is possible to keep the good parts of the HTML and discard the bad parts.

It looks like your HTML-based RTE works like most HTML editors out there. Specifically, it has an iframe, and on the document inside the iframe, it has set designMode to "on".

You'll want to trap the paste event when it occurs in the <body> element of the document inside that iframe. I was very specific here because I have to be: don't trap it on the iframe; don't trap it on the iframe's window; don't trap it on the iframe's document. Trap it on the <body> element of the document inside the iframe. Very important.

var iframe = your.rich.text.editor.getIframe(), // or whatever
    win = iframe.contentWindow,
    doc = win.document,
    body = doc.body;

// Use your favorite library to attach events. Don't actually do this
// yourself. But if you did do it yourself, this is how it would be done.
if (win.addEventListener) {
    body.addEventListener('paste', handlePaste, false);
} else {
    body.attachEvent("onpaste", handlePaste);
}

Notice my sample code has attached a function called handlePaste. We'll get to that next. The paste event is funny: some browsers fire it before the paste, some browsers fire it afterwards. You'll want to normalize that, so that you are always dealing with the pasted content after the paste. To do this, use a timeout method.

function handlePaste() {
    window.setTimeout(filterHTML, 50);
}

So, 50 milliseconds after a paste event, the filterHTML function will be called. This is the meat of the job: you need to filter the HTML and remove any undesireable styles or elements. You have a lot to worry about here!

I have personally seen MSWord paste in these elements:

meta
link
style
o:p (A paragraph in a different namespace)
shapetype
shape
Comments, like .
font
And of course, the MsoNormal class.

The filterHTML function should remove these when appropriate. You may also wish to remove other items as you deem necessary. Here is an example filterHTML that removes the items I have listed above.

// Your favorite JavaScript library probably has these utility functions.
// Feel free to use them. I'm including them here so this example will
// be library-agnostic.
function collectionToArray(col) {
    var x, output = [];
    for (x = 0; x < col.length; x += 1) {
        output[x] = col[x];
    }
    return output;
}

// Another utility function probably covered by your favorite library.
function trimString(s) {
    return s.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}

function filterHTML() {
    var iframe = your.rich.text.editor.getIframe(),
        win = iframe.contentWindow,
        doc = win.document,
        invalidClass = /(?:^| )msonormal(?:$| )/gi,
        cursor, nodes = [];

    // This is a depth-first, pre-order search of the document's body.
    // While searching, we want to remove invalid elements and comments.
    // We also want to remove invalid classNames.
    // We also want to remove font elements, but preserve their contents.

    nodes = collectionToArray(doc.body.childNodes);
    while (nodes.length) {
        cursor = nodes.shift();
        switch (cursor.nodeName.toLowerCase()) {

        // Remove these invalid elements.
        case 'meta':
        case 'link':
        case 'style':
        case 'o:p':
        case 'shapetype':
        case 'shape':
        case '#comment':
            cursor.parentNode.removeChild(cursor);
            break;

        // Remove font elements but preserve their contents.
        case 'font':

            // Make sure we scan these child nodes too!
            nodes.unshift.apply(
                nodes,
                collectionToArray(cursor.childNodes)
            );

            while (cursor.lastChild) {
                if (cursor.nextSibling) {
                    cursor.parentNode.insertBefore(
                        cursor.lastChild,
                        cursor.nextSibling
                    );
                } else {
                    cursor.parentNode.appendChild(cursor.lastChild);
                }
            }

            break;

        default:
            if (cursor.nodeType === 1) {

                // Remove all inline styles
                cursor.removeAttribute('style');

                // OR: remove a specific inline style
                cursor.style.fontFamily = '';

                // Remove invalid class names.
                invalidClass.lastIndex = 0;
                if (
                    cursor.className &&
                        invalidClass.test(cursor.className)
                ) {

                    cursor.className = trimString(
                        cursor.className.replace(invalidClass, '')
                    );

                    if (cursor.className === '') {
                        cursor.removeAttribute('class');
                    }
                }

                // Also scan child nodes of this node.
                nodes.unshift.apply(
                    nodes,
                    collectionToArray(cursor.childNodes)
                );
            }
        }
    }
}

You included some sample HTML that you wanted to filter, but you did not include a sample output that you would like to see. If you update your question to show what you want your sample to look like after filtering, I will try to adjust the filterHTML function to match. For the time being, please consider this function as a starting point for devising your own filters.

Note that this code makes no attempt to distinguish pasted content from content that existed prior to the paste. It does not need to do this; the things that it removes are considered invalid wherever they appear.

An alternative solution would be to filter these styles and contents using regular expressions against the innerHTML of the document's body. I have gone this route, and I advise against it in favor of the solution I present here. The HTML that you will receive by pasting will vary so much that regex-based parsing will quickly run into serious issues.

Edit:

I think I see now: you are trying to remove the inline style attributes themselves, right? If that is so, you can do this during the filterHTML function by including this line:

cursor.removeAttribute('style');

Or, you can target specific inline styles for removal like so:

cursor.style.fontFamily = '';

I've updated the filterHTML function to show where these lines would go.

Good luck and happy coding!

Christelchristen answered 31/5, 2011 at 14:31 Comment(3)

Hi , thnx for the detailed explanation of my question , for the time being , instead of removing all the css styling which the pasted text inherited from the copied source , I used !important in my css to over come the inline css of the pasted text . It doesnt looks like a proper way of doing it , i have updated the above question – Lighthouse 1/6, 2011 at 9:12

and i am using asp.net/ajax/ajaxcontroltoolkit/samples/htmleditor/… – Lighthouse 1/6, 2011 at 9:15

Great input. Building a WYSIWYG from scratch, and was so excited (then horrified) about pasted items being somewhat formatted. Boo inline styles, yay limited regex with mostly js removal of code soup! – Helgeson 29/12, 2013 at 19:1

Here is a potential solution that strips out the text from the HTML. It works by first copying the text as HTML into an element (which probably should be hidden but is shown for comparison in my example). Next, you get the innerText of that element. Then you can put that text into your editor wherever you like. You will have to capture the paste event on the editor, run this sequence to get the text, and then put that text wherever you like in your editor.

Here is a fiddle for an example of how to do this: Getting text from HTML

Broomcorn answered 27/5, 2011 at 21:44 Comment(0)

If you are using Firefox, you can install this extension: https://addons.mozilla.org/en-US/firefox/addon/extended-copy-menu-fix-vers/. It allows you to copy the text from any website without the formatting.

Treadway answered 27/5, 2011 at 21:6 Comment(0)

Generally when supporting HTML editing by end users I have opted for leveraging one of a number of solid client-side HTML editing controls that already have the requisite functionality built in to handle stuff like this. There are a number of commercial versions, such as from Component Art, as well as some great free/open source versions, such as CKEditor.

All of the good ones have solid paste-from-Word support to strip out/fix this excessive CSS. I would either just leverage one (the easy way) or see how they do it (the hard way).

Doghouse answered 29/5, 2011 at 2:16 Comment(0)

I always get this kind of problem, it is interesting one. Well the way I do is very simple, just open Notepad in windows and paste your text into Notepad and copy over to your AJAX text editor. It will strip all your text styling.

Antedate answered 3/6, 2011 at 9:9 Comment(1)

The question specifically requested keeping the structure of the content--keeping the html tags while pasting from Word into a web browser but removing the CSS styling. Also, this answer is not suitable for developers, but is more of a suggestion that would need to be communicated to end users. – Doghouse 3/6, 2011 at 17:5

From what I understand from your question, you are using a WYSIWYG editor. And when copying and pasting text from other web pages or word documents you get some ugly html with inline-styles etc.

I would suggest that you don't bother at all to fix this, because it's a mess to deal with this issue cross-browser. If you really want to fix it though I would recommend using TinyMCE which got this exact behavior that you want.

You can try it in action by visiting http://tinymce.moxiecode.com/tryit/full.php and just copy some text into the editor and then submit it all to see the generated html. It's clean.

TinyMCE is probably the best WYSIWYG editor that you'll find imo. So instead of building something on your own, just use it and customize it to your exact needs.

Pamalapamela answered 3/6, 2011 at 9:13 Comment(0)

Recommended topics

Hot tags