Is there a way to diff files sentence-by-sentence instead of line-by-line?
Asked Answered
A

3

10

Just trying to get diff to work better for certain kinds of documents. With LaTeX, for example, I might have a long paragraph that is strictly just one line, but I don't want to see that entire paragraph if just a sentence is changed. Particularly if I'm running some kind of version control and a co-author edits the same paragraph (but not the same sentence) as me. I wouldn't want that to show up as a conflict.

That's a secondary question. The main question is whether I can use diff to look sentence-by-sentence. Thanks.

Edit

wdiff is almost perfect. But is there a merge equivalent, as diff has with diff3?

Allottee answered 10/5, 2009 at 16:52 Comment(0)
A
6

wdiff will give you a word-by-word diff instead of line-by-line. I'm not aware of any sentence-by-sentence diff programs.

Anson answered 10/5, 2009 at 17:2 Comment(2)
I've been working on a Python library to solve this problem... github.com/will-hart/PyFreeDiff. Its early days but can currently build and apply diffs between two filesScrag
@Scrag I haven't done anything with it yet, but the link above s/b to github.com/will-hart-PyTextDiffNotion
U
2

Preprocess the files before diffing them. Write a script to write one sentence per line and any line by line diff program will work.

I have done this on a C token level for diffing C code in order to make absolutely sure my CVS merge was correct.

Unsettle answered 10/5, 2009 at 17:39 Comment(0)
H
0

Answering 14 years later in case anyone comes across this with git diff in mind specifically (which seems to have been the implied intend in the original question).

Git diff supports a --word-diff option, which does pretty much what the question is asking in this context.

--word-diff supports a number of modes (namely color, plain and porcelain). For the purposes of latex and long sentences, for me the best option would be --word-diff=porcelain. This walks through the sentence until it finds a difference, and then outputs the difference separately as a removed/added pair, before continuing on with the sentence.

In other words, if you changed your latex from

 This is a common part of the sentence, and previously we had this and the rest is common again

to

This is a common part of the sentence, but then we changed this part and the rest is common again

then git diff --word-diff=porcelain will give:

 This is a common part of the sentence,
-and previously we had this
+but then we changed this part
 and the rest is common again

(where the - line will be coloured red, and the + line will be coloured green)

Hardness answered 13/8, 2023 at 11:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.