I'm using Git to version prose and have been trying git diff --word-diff
to see changes within lines. I want to use the results generated in a script.
But the default way that --word-diff
identifies a word seems flawed. So I've been experimenting with --word-diff-regex=
options.
Problem
Here are the two main flaws I'm trying to deal with:
Added whitespace seems to be ignored. But whitespace can be quite important if trying to use the results programmatically.
For example, take this header from a Markdown (.md) file:
# Test file
Now, let's add some text to the end of it:
# Test file in Markdown
If I run
git diff --word-diff
on this:# Test file {+in Markdown+}
But the space before the word "in" has not been included as part of the diff.
Empty lines are completely ignored.
Here's a standard
git diff
for the content of a file where I've removed a line and also added a couple of new lines -- one empty, the other with the text "Here's a new line."This is a test file to see how word diff responds in certain situations. - I'll try removing lines and adding them to see what happens. Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all! + +Here's a new line.
But here's
git diff --word-diff
for the same content:This is a test file to see how word diff responds in certain situations. I'll try removing lines and adding them to see what happens. Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all! {+Here's a new line.+}
The removed and added empty lines are completely ignored.
Desired results
Putting the two examples above together. Here's what I'd like to see:
# Test file{+ in Markdown+}
This is a test file to see how word diff responds in certain situations.
{--}
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
{++}
{+Here's a new line.+}
Things I've tried:
git diff --word-diff-regex='.'
seems too granular for when new words share characters with existing wordsgit diff --word-diff-regex='[^ ]+|[ ]'
seems to solve the first problem but, to be honest, I'm not actually sure why.git diff --word-diff-regex='[^ ]+|[ ]|^$'
-- I was hoping the^$
on the end would help capture empty lines -- but it doesn't and, worse, it then seems to ignore the change that follows.git diff --word-diff-regex='[^ ]+|[ ]|.{0}'
creates same problem as the one before.
I'd be grateful if anyone could shed any light on how to do this, or at least share some knowledge on what's going on under the hood with --word-diff-regex
.
--word-diff-regex='\n'
, the last line of my example displaysHere's a {+new lin+}e.
Odd. Firstly, the regex flavour Git is using doesn't seem to recognise\n
as a newline character (the same is true of an escaped version\\\n
. So which flavour is Git using? Secondly: is the regex here really defining a word -- note that the diff bit picked out begins and ends with a literaln
. So is it looking for boundaries instead and does that affect the way we should write regexes for--word-diff-regex=
? – Guano{+new lin+}e
result: the regex matched the newline, which the word diff code discarded after the whitespace was removed, so the word diff code discarded thee
inline
. – Idealistic\n
to indicate a newline. If you want to match a newline, you have to include it literally. – Schnapps--word-diff-regex='^$'
or--word-diff-regex='.{0}'
would not capture empty lines, since both those patterns would be compliant with POSIX ERE. But I'm starting to assume there's something in the Git script itself which deliberately skips empty lines rather than attempting to match them to the regex pattern provided. – Guano