Is there a JS diff library against htmlstring just like google-diff-match-patch on plain text? [closed]

Asked 25/1, 2010 at 14:31 Answered 17/11, 2022 at 13:44

Currently I am using google-diff-match-patch to implement a real-time editing tool, which can synchronize texts between multiple users. Everything works great when operations are only plain texts, each user's operation(add/delete texts) could be diff-ed out by comparing to old text snapshot with the helper of google-diff. But when rich format texts(like bold/italic) are involved, google-diff not working well when comparing the htmlstring. The occurrence of character of < and > messed up the diff results, especially when bold/italic format are embedded within each other.

Could anyone suggest a similar library like google-diff to diff htmlstrings? Or any suggestions can get my problem fixed with google-diff? I understood google-diff is designed for plain text, but really didn't find a better library than it so far, so it also works if a doable enhancement to google-diff can help.

Investigation answered 25/1, 2010 at 14:31 Comment(0)

The wiki at the google-diff-match-patch project shares some ideas. From http://code.google.com/p/google-diff-match-patch/wiki/Plaintext :

One method is to strip the tags from the HTML using a simple regex or node-walker. Then diff the HTML content against the text content. Don't perform any diff cleanups. This diff enables one to map character positions from one version to the other (see the diff_xIndex function). After this, one can apply all the patches one wants against the plain text, then safely map the changes back to the HTML. The catch with this technique is that although text may be freely edited, HTML tags are immutable.

Another method is to walk the HTML and replace every opening and closing tag with a Unicode character. Check the Unicode spec for a range that is not in use. During the process, create a hash table of Unicode characters to the original tags. The result is a block of text which can be patched without fear of inserting text inside a tag or breaking the syntax of a tag. One just has to be careful when reconverting the content back to HTML that no closing tags are lost.

I have a hunch that the 2nd idea, map-HTML-tags-to-Unicode-placeholders, might work better than one would otherwise guess... especially if your HTML tags are from some reduced set, and if you can perform a little open/close touchup when displaying interleaved (strikethrough/underlined) diff markup.

Another method that might work with simple styling would be remove the HTML tags, but remember the character-indexes affected. For example, "positions 8-15 are bolded". Then, perform a plaintext diff. Finally, using the diff_xIndex position-mapping idea from the wiki's first method, intelligently re-insert HTML tags to reapply stylings to the ranges surviving/added. (That is, if old positions 8-13 survived, but moved to 20-25, insert the B tags around there.)

Meaty answered 30/3, 2011 at 6:25 Comment(7)

And what about this: escape the html characters (<, >, &), do all the diff/patch/merge work and unescape the result. Seems to be the stablest solution to me. – Painterly 25/1, 2012 at 12:31

I think you'd find that approach would result in the exact same output as not-escaping them. The diffing algorithm doesn't have any problem treating them like any other character; the problem is keeping them balanced, and escaping them doesn't address that. – Meaty 25/1, 2012 at 18:20

I went through this and ended up creating a wrapper library to help with the "presentation work" needed to use diff_match_patch: github.com/arnab/jQuery.PrettyTextDiff – Tenotomy 24/1, 2013 at 9:42

@arnab - FYI your jsfiddle demo isn't working in FF19/Mac (but does in Chrome23/Mac). – Meaty 31/1, 2013 at 5:7

@gojomo: Thanks for the note. Just checked out FF19 on Mac (OSX 10.8.2) - worked fine. What error do you get (maybe in console)? – Tenotomy 1/2, 2013 at 6:6

@arnab: Still having problem; OSX/10.7.3, FF/19.0 "up to date" "on the beta channel". After pressing 'Diff' button, error in console is TypeError: $(...).prettyTextDiff is not a function http://fiddle.jshell.net/_display/ Line 38. (There is an earlier console warning, on page load, about getAttributeNode() being deprecated, but that seems harmless.) – Meaty 5/2, 2013 at 19:17

Try now. I updated it about a month back (forgot to mention here). – Tenotomy 1/8, 2013 at 7:16

jsdifflib - A Javascript Visual Diff Tool & Library https://github.com/cemerick/jsdifflib

There's a demo here: http://cemerick.github.io/jsdifflib/demo.html

Pvc answered 2/5, 2011 at 19:32 Comment(0)

Pretty Diff does everything you need, except you will need to update the DOM response so that the diff fires against the "onkeyup" event instead on button click.

http://prettydiff.com/

Unflinching answered 11/3, 2011 at 15:34 Comment(0)

Take a look at SynchroEdit, might be useful.

Vexatious answered 26/1, 2010 at 19:21 Comment(2)

Gamers2000, thanks for the comment. I did tried SynchoEdit, but neither sandbox nor dev version is working. Btw, I also put an question in your original "OT library question", are you also working with google-diff-match-patc? How do you use it with rich format htmlstrings? Thanks for any comments. – Investigation 27/1, 2010 at 2:17

Hi Steve, I am working with diff-match-patch, but I'm using it to synchronize plain text. Also, I'm actually using MobWrite(code.google.com/p/google-mobwrite), which is an implementation of diff-match-patch. Sorry I can't be of much help! – Vexatious 27/1, 2010 at 3:38

There is another popular library called JSDiff https://github.com/kpdecker/jsdiff. It works with HTML content too. The only drawback is that it requires a new line carriage return at the end of each line to treat it as a different line. Otherwise, all the HTML content will be treated like a single line.

Pola answered 17/11, 2022 at 13:44 Comment(0)

Recommended topics

Hot tags