newline-ignoring diff / diff across multiple lines / reflow-ignoring diff
Asked Answered
F

2

16

Does anybody know of a diff-like tool that can show me the changes between two text files, but ignore changes in whitespace including newlines?

Here's an example:

the quick brown fox jumped over the lazy bear.  the quick brown fox
jumped over the lazy bear.  the quick brown fox jumped over the lazy
bear.  the quick brown fox jumped over the lazy bear.
quick brown fox jumped over the lazy bear.  the quick brown fox jumped
over the lazy bear.  the quick brown fox jumped over the lazy bear.
the quick brown fox jumped over the lazy bear.

All I did was delete one word and reflow it, but "diff -b" detects a change on every line (as it should; I'm not saying this is a bug in diff). But for large LaTeX files this is a major problem; change one word in a long paragraph and the diff you get back is basically useless.

By the way, I'm aware that this requires way more computational power than the usual lines-are-atomic diff. I'm only doing this on small human-generated files and am happy to wait a long time if I have to.

For answered 9/4, 2010 at 3:18 Comment(0)
D
13

wdiff does word-by-word alignment.

For an easy-to-read display in a terminal, run

 wdiff -al <file1> <file2> | less

This will show (at least on my system) insertions in <file2>boldfaced and deletions from <file2> underlined.

Dierdredieresis answered 9/4, 2010 at 3:22 Comment(4)
WARNING: wdiff may not be avialable on every system. But it is a cool utilityNotochord
Hooray! That is exactly what I wanted. Now I just have to wait for stack overflow to let me declare this the answer.For
While wdiff is kind of cool, I'm actually downvoting this for two reasons: first and foremost, wdiff doesn't show the line numbers of the diffs found (huge inconvenience); and second, because it's word oriented, it can't ignore changes in whitespace (ie foreach( vs foreach (). I've written a PHP script to compensate for the second issue, but without line numbers it's a huge waste of time trying to grep to find the "interesting" differences. I was hoping to say something nice about compare++ but I have yet to hear back from them regarding whether they have an .rpmPyroclastic
@Pyroclastic "Use your downvotes whenever you encounter an egregiously sloppy, no-effort-expended post, or an answer that is clearly and perhaps dangerously incorrect." The answer is correct for the question asked. That it doesn't match your specific requirements does not make it incorrect, and if you have a better solution, that should be posted as an alternate answer.Skycap
N
1

One option is to do this by splitting the entire file into words. Not 100% the same result in terns of knowing the context but very fine-tuned to the type of change you care about.

Example :

cat file1 | perl5.8 -e '{s/\s+/\n/g;}' > file1.split_words
cat file2 | perl5.8 -e '{s/\s+/\n/g;}' > file2.split_words
diff file1.split_words file2.split_words

You can do even better if the text has special properies, to be more specific, the reflow only happens within the bounds of a paragraph which is defined as 2 newlines in a row - simply replace all the single newlines with spaces and run regular diff -w on results.

Notochord answered 9/4, 2010 at 3:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.