Generate pretty diff HTML in Python
Asked Answered
S

7

37

I have two chunks of text that I would like to compare and see which words/lines have been added/removed/modified in Python (similar to a Wiki's Diff Output).

I have tried difflib.HtmlDiff but it's output is less than pretty.

Is there a way in Python (or external library) that would generate clean looking HTML of the diff of two sets of text chunks? (not just line level, but also word/character modifications within a line)

Selfexplanatory answered 16/10, 2009 at 6:39 Comment(0)
D
35

There's diff_prettyHtml() in the diff-match-patch library from Google.

Darrin answered 16/10, 2009 at 8:15 Comment(3)
The .zip download link now gives a 404 :(Appetency
It's hard to tell if there's a way to generate a good side-by-side diff of multiple-line files with diff-match-patch. It seems mostly focused on character-level comparison, and the documentation on line-level is not very good (and the example is only in JavaScript).Synergistic
Also I think its new home is here: github.com/google/diff-match-patchSynergistic
T
26

Generally, if you want some HTML to render in a prettier way, you do it by adding CSS.

For instance, if you generate the HTML like this:

import difflib
import sys

fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()

diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)

sys.stdout.writelines(diff)

then you get green backgrounds on added lines, yellow on changed lines and red on deleted. If I were doing this I would take take the generated HTML, extract the body, and prefix it with my own handwritten block of HTML with lots of CSS to make it look good. I'd also probably strip out the legend table and move it to the top or put it in a div so that CSS can do that.

Actually, I would give serious consideration to just fixing up the difflib module (which is written in python) to generate better HTML and contribute it back to the project. If you have a CSS expert to help you or are one yourself, please consider doing this.

Thirzi answered 16/10, 2009 at 16:40 Comment(3)
Someone implemented your proposal (as I often find is the case with Python). HtmlDiff has make_table() method which just creates the HTML table. So user can add own CSS to prettify it. Compared to accepted answer, this is included (from py 2.4).Brachial
Unfortunately the HTML generated by difflib.HtmlDiff is a pretty archaic table format that isn't well suited to customization with CSS. But it still works pretty well, if you don't need a lot of customization. You can probably change colors and fonts, but that's about it. The big secret that I almost missed is the wrapcolumn argument to the constructor, which lets you prevent the table from being arbitrarily wide.Synergistic
This process shows the ENTIRE file side by side even if only ONE LINE HAS CHANGED. THis is a problem if the file is large. Not sure if there's a way to fix thisMesoderm
B
6

I recently posted a python script that does just this: diff2HtmlCompare (follow the link for a screenshot). Under the hood it wraps difflib and uses pygments for syntax highlighting.

Bilski answered 25/4, 2015 at 16:42 Comment(0)
D
1

Since the .. library from google seems to have no active development any more, I suggest to use diff_py

From the github page:

The simple diff tool which is written by Python. The diff result can be printed in console or to html file.

Dying answered 11/2, 2016 at 11:42 Comment(0)
A
1

not just line level, but also word/character modifications within a line

xmldiff seems to be a nice package for this purpose especially when you have XML/HTML to compare. Read more in their documentation.

Aggiornamento answered 24/12, 2018 at 23:15 Comment(0)
K
0

try first of all clean up both of HTML by lxml.html, and the check the difference by difflib

Kurus answered 16/10, 2009 at 7:41 Comment(0)
I
-3

A copy of my own answer from here.


What about DaisyDiff (Java and PHP vesions available).

Following features are really nice:

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.
Instance answered 20/10, 2009 at 8:58 Comment(1)
This is rather no python related answer.Nordau

© 2022 - 2024 — McMap. All rights reserved.