Format string for consistent separation between entries output by `git log --pretty`
Asked Answered
S

2

5

I am trying to develop a format string to pass to git log --pretty so that each log entry ends in a full commit message, yet each log entry is separated by exactly one empty line. The problem is that some full commit messages end in a newline, and some do not.

For example, let's say I have two commits, abc1234 and def5678, but only abc1234 contains a newline at the end of the full commit message. Outputting the raw commit contents on the command line would look something like this:

[prompt]$ git cat-file commit abc1234
(...)

Title FOO

Full commit message FOO
[prompt]$ git cat-file commit def5678
(...)

Title BAR

Full commit message BAR[prompt]$

Note how the new shell prompt appears at the end of the last line of output, demonstrating that commit def5678 does not contain a newline at the end of the full commit message.

Let's say that def5678 is the parent of abc1234 and I want to output a simple log where each entry contains only the short commit hash, title line, and full commit message. I might try something like this:

[prompt]$ git log --graph --pretty='commit %h%n%B' abc1234
* commit abc1234
| Title FOO
| 
| Full commit message FOO
|
* commit def5678
| Title BAR
|
| Full commit message BAR
* commit <parent of def5678>
(...)

Note the spacing between the log entries. The entries for abc1234 and def5678 are separated by a blank line (save for the graph character), yet the entries for def5678 and its parent are not.

How can I construct a format string so that the spacing is consistent, even with inconsistent termination of full commit messages? The builtin pretty formats of medium, full, fuller, and email already do that, but I want to be able to construct arbitrary format strings to do the same thing.

I've experimented with the %+B, %-B and % B sequences (and their %b and %n equivalents), but I just can't seem to get consistent spacing.

I'm using Git 2.17.0 if that makes a difference.

Scornful answered 19/9, 2019 at 17:23 Comment(9)
What tool did you use to write the commit messages? Did you commit using git directly or through some alternate implementation (that is, one that does not use libgit2)? IIRC, git automatically fixes messages with missing EOLs, but some reimplementations do not.Frankfort
I did not commit the messages myself; I am attempting to read a git history created by other developers.Scornful
Is fixing all messages an option (it would change all hashes, etc)? It seems to be much easier to do it.Frankfort
That would be a last resort. I know I could do a filter-branch to fix the commits, but then my local repo would be incapable of tracking upstream without grafts or some other cumbersome mechanism.Scornful
I meant fixing them upstream. Fixing this in a copy of the repository might be an option though. How big is the repository?Frankfort
It would probably be nice if Git had a format modifier that meant "add a newline if the text does not end with newline", but it doesn't. (Not an answer, just a comment.)Interrogator
try git log --graph --pretty='commit %h%n%B%-gd%n', you're not using reflog selectors so %gd will expand empty and the - will eat all preceding newlines.Tannic
@Tannic Yes, that's true. I think I've got an even better one: git log --graph --pretty='commit %h%n%B%-C()%n' (%C() being an empty color selector). Seems pretty kludgy but it works!Scornful
That's the kludge I was looking for, I just couldn't see it. I love the smell of a good kludge in the morning. Or the afternoon. Any time, really.Tannic
S
6

As @jthill alluded to in the comments, and expressed in git-log(1):

If you add a - (minus sign) after % of a placeholder, all consecutive line-feeds immediately preceding the expansion are deleted if and only if the placeholder expands to an empty string.

Therefore, if we can find a format sequence %<token> that will always expand to the empty string, we can use %-<token>%n to replace zero or more consecutive newlines with a single newline. As it turns out, there is such a format sequence: %C(), the empty color selector. (Normally the parentheses enclose a non-empty string that specifies coloration to be used in the log output. See git-log(1) and git-config(1) for more details.)

The fact that %C() evaluates to the empty string instead of causing an error seems like a happy accident instead of something to lay one's hat upon, but at least for Git 2.17, it does the trick. At this point, I have enough information to answer my own question.

To maintain consistent separation between log entries output by git log --pretty=<tformat>, where <tformat> may or may not evaluate to a string ending in a newline, append %-C()%n to <tformat>. For example:

[prompt]$ git log --graph --pretty='commit %h%n%B%-C()%n' abc1234
* commit abc1234
| Title FOO
| 
| Full commit message FOO
|
* commit def5678
| Title BAR
|
| Full commit message BAR
|
* commit <parent of def5678>
(...)
Scornful answered 20/9, 2019 at 2:14 Comment(0)
F
3

From the git log man page:

-z

Separate the commits with NULs instead of with new newlines.

With that, you are able to cleanly separate each message. Then, for each message, add a line terminator in the end only if it is missing one.

Example:

fix-eol:

#!/bin/sh

lastchar="$(printf -- '%s\n' "$@" | tail -c1 | od -An -tx1)"

# output the input as is
printf -- '%s' "$@"

# print a newline only if it's missing
test "$lastchar" = ' 0a' && printf -- '\n'

Usage:

git log -z --pretty='commit %h%n%B' | xargs -0 -n1 ./fix-eol

Note:: I was unable to save commit messages with missing line terminators (it seems that git fixes them automatically), but the script worked on such files. Also, depending on the size and content of the messages, issues may arise, as each message is passed as an argument (i.e.: this was a quick a hack).

P.S.: The example can probably be improved in order to avoid the usage of a separate script.

Frankfort answered 19/9, 2019 at 18:41 Comment(4)
That's a good idea, but unfortunately it doesn't work for me. Log entries start with --commit <hash> and the entries still experience the same problem with spacing. Also, a little bit of reworking of your fix-eol script might be necessary to make it compatible with the --graph option. Still, this is a good foundation to tinker with... I will come back to this in a few hours.Scornful
"Log entries start with --commit <hash>" Fixed; I had copied the wrong version.Frankfort
"to make it compatible with the --graph option" Does the graph drawing shift for every broken message or only on the last one? In the former case, I don't think that there's a straightforward (or guaranteed) way to do that.Frankfort
Your fix did remove the leading -- from each log entry, but I still found spacing to be an issue. I even tried a different platform (Linux instead of Cygwin) and Git 2.17.2 instead of 2.17.0. Regarding the --graph option, that was never actually an issue. I just got thrown off by the leading -- characters. Sorry for the misinformation.Scornful

© 2022 - 2024 — McMap. All rights reserved.