git + LaTeX workflow
Asked Answered
M

4

307

I'm writing a very long document in LaTeX. I have my work computer and my laptop, and I work on them both. I need to keep all the files synchronized between the two computers, and also would like to keep a revision history. I chose git as my DVCS, and I'm hosting my repository on my server. I'm also using Kile + Okular to do the editing. Kile doesn't have an integrated git plugin. I'm also not collaborating with anyone on this text. I'm also thinking about putting another private repository on codaset, if my server for some reason is not accessible.

What is the recommended workflow practice in this case? How can branching be fitted in this working scheme? Is there a way to compare two versions of the same file? What about using a stash?

Mcdougald answered 31/5, 2011 at 14:3 Comment(0)
D
437

Changes to your LaTeX workflow:

The first step in efficiently managing a Git+LaTeX workflow is to make a few changes to your LaTeX habits.

  • For starters, write each sentence on a separate line. Git was written to version control source code, where each line is distinct and has a specific purpose. When you write documents in LaTeX, you often think in terms of paragraphs and write it as a free flowing document. However, in git, changes to a single word in a paragraph get recorded as a change to the entire paragraph.

    One solution is to use git diff --color-words (see my answer to a similar question How to use Mercurial for version control of text documents? where I show an example). However, I must emphasize that splitting into separate lines is a much better option (I only mentioned it in passing in that answer), as I've found it to result in very minimal merge conflicts.

  • If you need to look at the code diff, use Git's native diff. To see the difference between two arbitrary commits (versions), you can do so with the shas of each of the commits. See the documentation for more details and also Showing which files have changed between two revisions.

    On the other hand, if you need to look at the diff of your formatted output, use latexdiff which is an excellent utility (written in perl) that takes two latex files and produces a neat diffed output in pdf like this (image source):

    You can combine git and latexdiff (plus latexpand if needed) in a single command using git-latexdiff (e.g. git latexdiff HEAD^ to view the diff between your worktree and the last-but-one commit).

  • If you're writing a long document in LaTeX, I'd suggest splitting different chapters into their own files and call them in the main file using the \include{file} command. This way it is easier for you to edit a localized part of your work, and it is also easier for version control, as you know what changes have been made to each chapter, instead of having to figure it out from the logs of one big file.

Using Git efficiently:

  • Use branches!. There is perhaps no better advice I can give. I've found branches to be very helpful to keep track of "different ideas" for the text or for "different states" of the work. The master branch should be your main body of work, in its most current "ready to publish" state i.e., if of all the branches, if there is one that you are willing to put your name on it, it should be the master branch.

    Branches are also extremely helpful if you are a graduate student. As any grad student will attest, the advisor is bound to have numerous corrections, most of which you don't agree with. Yet, you might be expected to atleast change them for the time being, even if they are reverted later after discussions. So in such cases, you could create a new branch advisor and make changes to their liking, at the same time maintaining your own development branch. You can then merge the two and cherry pick what you need.

  • I would also suggest splitting each section into a different branch and focus only the section corresponding to the branch that you're on. Spawn a branch when you create a new section or dummy sections when you make your initial commit (your choice, really). Resist the urge to edit a different section (say, 3) when you're not on its branch. If you need to edit, commit this one and then checkout the other before branching. I find this very helpful because it keeps the history of the section in its own branch and also tells you at a glance (from the tree) how old some section is. Perhaps you've added material to section 3 that requires tweaking to section 5… Of course, these will, in all probability, be observed during a careful reading, but I find it helpful to see this at a glance so that I can shift gears if I'm getting bored of a section.

    Here's an example of my branches and merges from a recent paper (I use SourceTree on OS X and Git from the command line on Linux). You'll probably notice that I'm not the world's most frequent committer nor do I leave useful comments all the time, but that's no reason for you not to follow those good habits. The main takeaway message is that working in branches is helpful. My thoughts, ideas and development proceeds non-linearly, but I can keep track of them via branches and merge them when I'm satisfied (I also had other branches that led nowhere that were later deleted). I can also "tag" commits if they mean something (e.g., initial submissions to journals/revised submissions/etc.). Here, I've tagged it "version 1", which is where the draft is as of now. The tree represents a week's worth of work.

  • Another useful thing to do would be to make document wide changes (such as changing \alpha to \beta everywhere) commits on their own. That way, you can revert changes without having to rollback something else along with it (there are ways you can do this using git, but hey, if your can avoid it, then why not?). The same goes for additions to the preamble.

  • Use a remote repo and push your changes upstream regularly. With free service providers like GitHub and Bitbucket (both allow you to create private repos with a free account), there is no reason to not be using these if you're working with Git/Mercurial. At the very least, consider it as a secondary backup (I hope you have a primary one!) for your LaTeX files and a service that allows you to continue editing from where you left on a different machine.

Desired answered 31/5, 2011 at 16:9 Comment(15)
+1 for actually addressing the latex part of the question :] Also, I've never tried it so I can't really knock it, but wouldn't it be cumbersome to have every single sentence of a large document on a new line when trying to edit/read the source?Battle
@Diego: It did take a little getting used to at first, because your mind just wants to read it continuously. However, it's actually easier because I (and most people) look at the neat latex output to see if sentences make sense and to proof read it. Using these breaks has no effect on the output, and makes diffing a lot easier. Also, you can link the latex output to the source file, so if you spot an error or a typo, all you need to do is to click on it and it will take you right to the corresponding point in the source.Desired
@yoda Thanks I'll try that. I also suspect it would be trivial to add new paragraphs normally, and run a script to insert a line break after periods.Battle
@Diego: I suppose so, but you might have to be a little careful if you have math/numbers in your text which might have periods or manual spacing commands like \\[0.1in]. But yeah, I'd say it's worth a shot. It didn't take me long to switch from paragraph more to line-by-line mode.Desired
Thanks very much for your answer! I'm already using all the latex tricks that you mention on the file - the multi file approach, and my lines are broke down as paragraphs. I wondering about the git part, and have to understand better - and test! - the branching. As my thesis is on Numerical Calculus, Physics and Groundwater, most of the new ideas are first commited to the code and then to the thesis :-)Mcdougald
@Ivan: I found a nice tool that lets you view diffed outputs of two latex documents (see updated answer). I find it very useful to view changes in the output (and not the source). This is immensely helpful when checking if you've implemented all the reviewers' suggestions and looking at the state of the manuscript between the first submission and the second.Desired
I use similar approach, but how do you handle figures or other binary files, can git handle them as well or is there other approach for files which should be included in repo without version tracking?Featurelength
@liborw, Git handles (tracks changes to) binary files just fine. In general, it won't be able to give you a human-readable diff, though. For that, you can check out the new and old versions and compare them manually, though. (You can also do some scripting to automate some of that within Git if you need to).Bromidic
I'd add one more bit of advice: Commit often! Within reason, the fewer (and more closely-related) the changes in any given commit, the easier it is to find and fix things you didn't mean to do.Bromidic
These are handy tips, except one which I don't see the use: a branch per section. You can easily see changes on a per-file basis, so why increase workflow complexity by adding an extra layer of seperation? git [log|show|add] some_file.tex all work, no need to add the constant branch switching here. You can still commit each file on its own if you want.Truncation
@Truncation If you're splitting each section into different files, then yeah. I usually (and a lot of people in academic circles) work with only a single tex file per article. Individual files make sense for books/theses, where each chapter has a substantial chunk of material. Of course, these were only suggestions... each one should pick and reject tips according to what suits their workflow :)Desired
@yoda ah I see. Yes, then that makes sense. I tend to force multiple tex files on journals anyways ;-).Truncation
How do I do latexdiff when I have two branches in git. The latex files are breakdown into multiple ones such as thesis.tex and inside there are includes with chapter1, chapter2, chapter3... and so forth. Thank you for this great answer.Kaleena
You don't even need to create an account on a foreign webite for backup possibilty. With easy effort you can use a flashdrive (USB/SD) as a remote repository for your local repo. you don't even need to set up a raspberryPi (but you can of course, to push it to your backup-drive at home when you are somewhere else, it is not that hard to set one up: pi+ssh+dynDNS(+openvpn) ) or something similar to achieve this, just hand over the path to your flashdrive to the "git add remote <backup name> /path/to/backup" command and push. you always have a saved copy near you - if something goes wrong...Gushy
Very nice answer. What do you put in your .gitignore?Allpowerful
B
14

I have a similar workflow as well. Even though one branch is being worked on at a time, I find it beneficial to have separate branches for different states of work. For example, imagine sending a good rough draft of your paper to your advisor. Then, you get a crazy idea! You want to start changing some core concepts, re-work some major sections, etc. etc. So you branch off and start working. Your master branch is always in a “releasable” state (or as close as you are in that moment). So while your other branch is crazy and has some drastic changes, if another publisher wants to see what you have, or you’re a student submitting to a conference, the master branch is always releasable, ready to go (or ready to show your advisor). If your PhD advisor wants to see the draft first thing in the morning, yes you could stash/stage/commit your current changes, use tags or search through the log, but why not keep separate branches?!

Lets say your master branch has the "releasable" state of your work. You now want to submit it to several peer-reviewed journals, each having different formatting requirements for the same content and you're expecting them to come back with several different small criticisms about how you can edit the paper to fit their readers, etc. You could easily create a branch for each journal, make journal specific changes, submit, and when you receive the feedback make the changes on each separate branch.

I have also used Dropbox and git to create the system you describe above. You can create a bare-bones repository in your dropbox folder. You can then push/pull from either computer to your dropbox to stay up to date on all ends. This system usually only works when the number of collaborators are small since there is a possibility of corruption if people try to push to the dropbox repo at the same time.

You could technically also just keep ONE repository inside the dropbox folder and do all your work from there. I would discourage this however, as people have mentioned that dropbox has some trouble synchronizing files that are constantly changing (gits internal files).

Battle answered 31/5, 2011 at 14:28 Comment(1)
Just note that submitting a paper for peer review to several journals/conferences at the same time is usually not considered ethical!Espadrille
C
9

I tried to implement this as a bash function, I've included it in my ~/.bashrc to make it always available.

function git-latexdiff {    
    if [[ $# != 2 ]];    
    then      
        printf "\tusage: git-latexdiff <file> <back-revision>  \n";    
    elif [[ $2 -lt 0 ]];     
    then     
        printf "\t<Back-revision> must be positive\n";   
    else      
        dire=$(dirname $PWD/$1);      
        based=$(git rev-parse --show-toplevel);      
        git show HEAD~$2:$(echo $dire| sed 's!'$(echo $based)'/!!')/$1 > $1_diff.tmp;      
        latexdiff $1 $1_diff.tmp > $1_diff.tex;      
        pdflatex $1_diff.tex;     
        okular $1_diff.pdf;      
        rm $1_diff*;   
    fi; 
}

Note that this function needs latexdiff to be installed (and be found on the path). It is also important for it to to find pdflatex and okular.

The first is my prefered way to process LaTeX, so you can chage it to latex as well. The second is my PDF reader, I supose you'll want to use evince under gnome, or some other solution.

This is a quick version, made with a single document in mind, and that is because with git, you will lose a lot of time and effort tracking a multi file LaTeX document. You may let git do this task as well, but if you want, you can also continue using \include

Colza answered 2/6, 2012 at 0:54 Comment(4)
Have in mind that LaTeX references will not fit in generated visualizations. And also that the generated file is deleted in the end of function. As I said it's a quick version.Colza
The proposal for using latexdiff called as a gif helper is more complete in this answer to Using latexdiff with gitSodium
What do you mean by "gif helper", @Sodium ?Colza
Sorry, @Rafareino, I meant "git helper": a git helper is a tool that can be invoked by git for some operations. In this case, you can use the latexdiff command line tool just by using git diff, if you configure it properly.Sodium
C
0

use this for version diff in case you are on windows, no installment, just a simple bat script It works perfectly on windows10, miktex2.9:

https://github.com/redreamality/git-latexdiff

Carbarn answered 30/1, 2017 at 3:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.