In python, produce HTML highlighting the differences of two simple strings
Asked Answered
E

2

6

I need to highlight the differences between two simple strings with python, enclosing the differing substrings in a HTML span attribute. So I'm looking for a simple way to implement the function illustrated by the following example:

hightlight_diff('Hello world','HeXXo world','red')

...it should return the string:

'He<span style="color:red">XX</span>o world'

I have googled and seen difflib mentioned, but it's supposed to be obsolete and I haven't found any good simple demo.

Enneagon answered 22/2, 2012 at 13:59 Comment(3)
if a difference is found, should it always show the substring of the second string (in your example: 'XX')? You're just looking for positional differences right? this means, s1[0] with s2[0], s1[1] with s2[1] and so on ..Pierette
This is similar to the question answered HereCuticula
@julio.alegria Well, I am interested in highlighting the differing part of the first string as well, 'll' in my example. Indeed I'm looking for positional diffs.Enneagon
N
9

Everything that you need comes out of difflib -- for example:

>>> import difflib
>>> d = difflib.Differ()
>>> l = list(d.compare("hello", "heXXo"))
>>> l
['  h', '  e', '- l', '- l', '+ X', '+ X', '  o']

Each element in that list is a character from your two input strings, prefixed with one of

  • " " (2 spaces), character present at that position in both strings
  • "- " (dash space), character present at that position in the first string
  • "+ " (plus space), character present at that position in the second string.

Iterate through that list and you can build exactly the output you're looking to create.

There's no mention of difflib being in any way obsolete or deprecated in the docs.

Nogood answered 22/2, 2012 at 14:30 Comment(2)
Thanks, this is exactly the kind of thing I needed! I had the idea that difflib should be obsolete from the book "Python Essential Reference 4th ed." by D. M. Beazley 2009, page 586: "String processing. The following modules are some older, now obsolete, modules used for string processing ... difflib, fpformat, stringprep, textwrap "Enneagon
Far from being outdated, difflib is included in the Python standard library, even in Python 11 as of this writing. It's very mature (reliable and fast).Hawger
H
0

You can use this code with a small changes: (this code format each same and not same strings)

    import Levenshtein
    style_same = "style=\"background: green;\"" 
    style_unsame = "style=\"background: red;\""
    f_str = '<span {style}>{text}</span>'
    def getFormattedDiff(astring, bstring):
        formatted_a = []
        formatted_b = []
        begina, beginb = 0, 0
        for matching_block in Levenshtein.matching_blocks(Levenshtein.opcodes(astring, bstring), astring, bstring):
            a, b, size = matching_block.a, matching_block.b, matching_block.size
            formatted_a.append(f_str.format(style=style_unsame, text = astring[begina:a]) + f_str.format(style=style_same, text = astring[a:a+size]))
            formatted_b.append(f_str.format(style=style_unsame, text = bstring[beginb:b]) + f_str.format(style=style_same, text = bstring[b:b+size]))
            begina, beginb = a+size, b+size
        formatted_a.append(f_str.format(style=style_unsame, text = astring[begina:]))
        formatted_b.append(f_str.format(style=style_unsame, text = bstring[beginb:]))
        return ''.join(formatted_a), ''.join(formatted_b)
Haig answered 11/9, 2024 at 8:34 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.