Using 'diff' (or anything else) to get character-level diff between text files
Asked Answered
G

16

155

I'd like to use 'diff' to get a both line difference between and character difference. For example, consider:

File 1

abcde
abc
abcccd

File 2

abcde
ab
abccc

Using diff -u I get:

@@ -1,3 +1,3 @@
 abcde
-abc
-abcccd
\ No newline at end of file
+ab
+abccc
\ No newline at end of file

However, it only shows me that were changes in these lines. What I'd like to see is something like:

@@ -1,3 +1,3 @@
 abcde
-ab<ins>c</ins>
-abccc<ins>d</ins>
\ No newline at end of file
+ab
+abccc
\ No newline at end of file

You get my drift.

Now, I know I can use other engines to mark/check the difference on a specific line. But I'd rather use one tool that does all of it.

Gilbertgilberta answered 12/11, 2009 at 12:5 Comment(1)
per char diff is especially useful when it comes to CJK texts, where no whitespace is aplied for word splitting.Norvil
L
130

Git has a word diff, and defining all characters as words effectively gives you a character diff. However, newline changes are ignored.

Example

Create a repository like this:

mkdir chardifftest
cd chardifftest
git init
echo -e 'foobarbaz\ncatdog\nfox' > file
git add -A; git commit -m 1
echo -e 'fuobArbas\ncat\ndogfox' > file
git add -A; git commit -m 2

Now, do git diff --word-diff=color --word-diff-regex=. master^ master and you'll get:

git diff

Note how both additions and deletions are recognized at the character level, while both additions and deletions of newlines are ignored.

You may also want to try one of these:

git diff --word-diff=plain --word-diff-regex=. master^ master
git diff --word-diff=porcelain --word-diff-regex=. master^ master
Lykins answered 11/7, 2015 at 11:15 Comment(9)
You don't need to create a repo at all, you can simply give git diff any two files, anywhere on your filesystem and it works. Your command works great for me in that way, so thanks! git diff --word-diff=color --word-diff-regex=. file1 file2Susy
This is profoundly helpful! Would +1 once as a software developer and +1 twice more as an author/writer if I could. Unlike in code, where lines tend to be reasonably short, when writing papers/stories, each paragraph tends to take the form of a long word-wrapped line, and this feature makes the diffs actually visually useful.Terryl
I needed to add --no-index to@qwertzguys' response above in order to get it to work for me outside of a git repo. So: git diff --no-index --word-diff=color --word-diff-regex=. file1 file2Ileum
git diff doesn't work in general setting: git diff --no-index --word-diff=color --word-diff-regex=. <(echo string1) <(echo string2) .. Nothing, but this works: diff --color <(echo string1) <(echo string2).Markhor
@NathanBell I needed to add --no-index inside of a repo tooHissing
For a slightly more compact command --word-diff=color --word-diff-regex=. can be replaced with --color-words=.Elegist
This worked pretty well for me except for the colors (regular text was bright green while inserted text was slightly darker green). I discovered, however, you can temporarily change colors by passing an argument via -c eg: git -c color.diff.new="cyan bold" diff --color-words=. <file1> <file2> shows all additions in bold cyan.Elegist
--no-index is the wayWarbler
@Terryl "when writing papers/stories, each paragraph tends to take the form of a long word-wrapped line" related: merge that works at word granularityEphram
J
55

You can use:

diff -u f1 f2 |colordiff |diff-highlight

screenshot

colordiff is a Ubuntu package. You can install it using sudo apt-get install colordiff.

diff-highlight is from git (since version 2.9). It is located in /usr/share/doc/git/contrib/diff-highlight/diff-highlight. You can put it somewhere in your $PATH.

Joab answered 5/10, 2016 at 15:56 Comment(5)
colordiff is also available on homebrew for Mac: brew install colordiffMenides
On Mac you can find diff-highlight in $(brew --prefix git)/share/git-core/contrib/diff-highlight/diff-highlightIncommunicable
In case you didn't install git using brew - diff-highlight can also be installed with python's pip - pip install diff-highlight (I prefer it even if git is installed via brew)Broadcast
You actually can skip the first diff step for what its worth colordiff -u file1 file2 | diff-highlight works for meElegist
Fantastic just what I needed.Fulgent
U
29

Python's difflib is ace if you want to do this programmatically. For interactive use, I use vim's diff mode (easy enough to use: just invoke vim with vimdiff a b). I also occasionally use Beyond Compare, which does pretty much everything you could hope for from a diff tool.

I haven't see any command line tool which does this usefully, but as Will notes, the difflib example code might help.

Uniseptate answered 12/11, 2009 at 13:18 Comment(2)
Oh.. I was hoping for a something more standardized (like a hidden command line argument). The damnest thing is that I have Beyond Compare 2 and it even supports text output to file/console of the diff but it still only includes line-diffs and not char-diffs. I'll look into python if no one has anything else.Gilbertgilberta
+1 for introducing me to vimdiff. I found the default colors to be unreadable, but found a solution for that at #2019781.Octan
S
20

As one comment to main answer said you don't have to commit to use git diff:

git diff --no-index --word-diff=color --word-diff-regex=. file1 file2

enter image description here

green would be the character that is added by the second file.

red would be the character that is added by the first file.

Sherleysherline answered 12/7, 2022 at 5:28 Comment(5)
Yep, this is the best answer (as of today at least), given that it's the easiest to get on any system (git is fully cross-platform, and everyone has it on their computer nowadays) and the easiest to use, by far.Ricardaricardama
You may need to include --no-index to make it actually work, as commented on main answerApophasis
@PauloFreitas actually I had it in the screenshot but somehow I missed it in the command thanks for pointing out, editedSherleysherline
If you want to pipe this to less you can use less -R to keep the colorsSusy
I've had alias gitdiff="git diff --no-index --word-colors=." in my bashrc for a while now :D It's a beautiful, easy-to-read diff, especially if you have a bunch of huge XML / JSON files with all of the newlines removed :')Tankard
G
19

You can use the cmp command in Solaris:

cmp

Compare two files, and if they differ, tells the first byte and line number where they differ.

Globular answered 23/12, 2009 at 8:39 Comment(6)
cmp is also available on (at least some) Linux distributions.Cackle
It's also available on Mac OS X.Oxidimetry
Characters can consist of multiple bytes, and OP asked for a visual comparison.Emulous
@CeesTimmerman: cmp allows visual comparison, with flag -l -b.Meraree
cmp exits after the first differenceBondmaid
cmp -l shows every byte that is different, but I was expecting it to work like git diff --word-regex=., where if you delete a single character, it only shows 1 character as being different. Instead, cmp will show every character after that as different since the files are then misaligned. Just in case that saves anyone else the ~15-20 minutes it took me to realize that :PTankard
G
12

Python has convenient library named difflib which might help answer your question.

Below are two oneliners using difflib for different python versions.

python3 -c 'import difflib, sys; \
  print("".join( \
    difflib.ndiff( \ 
      open(sys.argv[1]).readlines(),open(sys.argv[2]).readlines())))'
python2 -c 'import difflib, sys; \
  print "".join( \
    difflib.ndiff( \
      open(sys.argv[1]).readlines(), open(sys.argv[2]).readlines()))'

These might come in handy as a shell alias which is easier to move around with your .${SHELL_NAME}rc.

$ alias char_diff="python2 -c 'import difflib, sys; print \"\".join(difflib.ndiff(open(sys.argv[1]).readlines(), open(sys.argv[2]).readlines()))'"
$ char_diff old_file new_file

And more readable version to put in a standalone file.

#!/usr/bin/env python2
from __future__ import with_statement

import difflib
import sys

with open(sys.argv[1]) as old_f, open(sys.argv[2]) as new_f:
    old_lines, new_lines = old_f.readlines(), new_f.readlines()
diff = difflib.ndiff(old_lines, new_lines)
print ''.join(diff)
Giliana answered 26/12, 2013 at 10:33 Comment(1)
Excellent one liners. Would be nice to have a condensed output that ignores unchanged lines.Rabin
M
7
cmp -l file1 file2 | wc

Worked well for me. The leftmost number of the result indicates the number of characters that differ.

Mishap answered 21/11, 2013 at 19:13 Comment(1)
Or to just get the leftmost number: cmp -l file1 file2 | wc -lAargau
F
7

Coloured, character-level diff ouput

Here's what you can do with the the below script and diff-highlight (which is part of git):

Coloured diff screenshot

#!/bin/sh -eu

# Use diff-highlight to show word-level differences

diff -U3 --minimal "$@" |
  sed 's/^-/\x1b[1;31m-/;s/^+/\x1b[1;32m+/;s/^@/\x1b[1;34m@/;s/$/\x1b[0m/' |
  diff-highlight

(Credit to @retracile's answer for the sed highlighting)

Frag answered 5/10, 2016 at 4:28 Comment(4)
It shows good diff on shell screen, but how do I see that diff in GVim??Lungan
What that's really a gvim question :). command | gvim - will do what you want.Dulcy
For reference diff-highlight appears to be included as part of git but not placed on your path. One my machine this lives at /usr/share/doc/git/contrib/diff-highlight.Dulcy
broken link. How do I install diff-highlight. Doesn't seem to be in a package manager.Tippets
L
5

I also wrote my own script to solve this problem using the Longest common subsequence algorithm.

It is executed as such

JLDiff.py a.txt b.txt out.html

The result is in html with red and green coloring. Larger files do exponentually take a longer amount of time to process but this does a true character by character comparison without checking line by line first.

Longing answered 1/12, 2016 at 20:30 Comment(1)
I have found that JLDiff runs a lot faster under pypy.Longing
I
5

ccdiff is a convenient dedicated tool for the task. Here is what your example looks like with it:

ccdiff example output

By default, it highlights the differences in color, but it can be used in a console without color support too.

The package is included in the main repository of Debian:

ccdiff is a colored diff that also colors inside changed lines.

All command-line tools that show the difference between two files fall short in showing minor changes visuably useful. ccdiff tries to give the look and feel of diff --color or colordiff, but extending the display of colored output from colored deleted and added lines to colors for deleted and addedd characters within the changed lines.

Illative answered 14/11, 2020 at 6:28 Comment(2)
looks nice, but lacking proper install instructions for non-Debian systems.Geologize
@Geologize (a) Download the tagged released version's .tar.gz file; (b) Untar the downloaded file with tar -xzf *.tar.gz; (c) Move & rename the extracted folder as /opt/ccdiff; (d) Add symbolic link to the executable in that folder inside some system folders with ln -s /opt/ccdiff/ccdiff /usr/local/bin/ccdiff; (e) can call it with ccdiff.Dice
G
4

Python's difflib can do this.

The documentation includes an example command-line program for you.

The exact format is not as you specified, but it would be straightforward to either parse the ndiff-style output or to modify the example program to generate your notation.

Gerhardine answered 12/11, 2009 at 12:35 Comment(1)
Thanks! I'll look into it. I was hoping for a something more standardized (like a hidden command line argument). But it might do fine still. I'll look into python if no one has anything more standard (though it seems like not).Gilbertgilberta
S
3

Here is an online text comparison tool: http://text-compare.com/

It can highlight every single char that is different and continues compare the rest.

Swanner answered 23/7, 2014 at 6:45 Comment(2)
This appears to do line-level diffs with no option for single characters. How do you get it to compare characters?Verdellverderer
Ah; it highlights characters which are different. But it's still line-level in that catdog and cat\ndog will only match on catVerdellverderer
B
2

Not a complete answer, but if cmp -l's output is not clear enough, you can use:

sed 's/\(.\)/\1\n/g' file1 > file1.vertical
sed 's/\(.\)/\1\n/g' file2 > file2.vertical
diff file1.vertical file2.vertical
Basic answered 20/11, 2017 at 17:45 Comment(1)
on OSX use ``` sed 's/(.)/\1\'$'\n/g' file1 > file1.vertical sed 's/\(.\)/\1\'$'\n/g' file2 > file2.vertical ```Rustler
M
0

If you keep your files in Git, you can diff between versions with the diff-highlight script, which will show different lines, with differences highlighted.

Unfortunately it only works when the number of lines removed matches the number of lines added - there is stub code for when lines don't match, so presumably this could be fixed in the future.

Milton answered 26/3, 2013 at 11:27 Comment(0)
F
0

I think the simpler solution is always a good solution. In my case, the below code helps me a lot. I hope it helps anybody else.

#!/bin/env python

def readfile( fileName ):
    f = open( fileName )
    c = f.read()
    f.close()
    return c

def diff( s1, s2 ):
    counter=0
    for ch1, ch2 in zip( s1, s2 ):
        if not ch1 == ch2:
            break
        counter+=1
    return counter < len( s1 ) and counter or -1

import sys

f1 = readfile( sys.argv[1] )
f2 = readfile( sys.argv[2] )
pos = diff( f1, f2 )
end = pos+200

if pos >= 0:
    print "Different at:", pos
    print ">", f1[pos:end]
    print "<", f2[pos:end]

You can compare two files with the following syntax at your favorite terminal:

$ ./diff.py fileNumber1 fileNumber2
Fowl answered 23/10, 2013 at 19:33 Comment(1)
"I think the simpler solution is always a good solution" and yet your answer is the most complicated out of them all, doesn't use color highlighting for easy visualisation and only prints the position of the first diverging character. It does not answer the question which is about getting a diff, i.e. being able to find and see differences between the strings. What you made is simply an equality comparison function (and not even machine-usable since you don't send a non-null exit code in case of inequality).Ricardaricardama
B
0

Most of these answers mention using of diff-highlight, a Perl module. But I didn't want to figure out how to install a Perl module. So I made a few minor changes to it to be a self-contained Perl script.

You can install it using:

▶ curl -o /usr/local/bin/DiffHighlight.pl \
   https://raw.githubusercontent.com/alexharv074/scripts/master/DiffHighlight.pl

And the usage (if you have the Ubuntu colordiff mentioned in zhanxw's answer):

▶ diff -u f1 f2 | colordiff | DiffHighlight.pl

And the usage (if you don't):

▶ diff -u f1 f2 | DiffHighlight.pl
Burks answered 17/2, 2019 at 8:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.