How to prevent Git from commiting Jupyter Notebook results?
Asked Answered
S

5

17

I m working on project in Jupyter Notebook.

Whenever I make a commit not only changed code and markdown columns get commited but also results from code columns.

That makes Git diffs unreadable and it is very hard to review pull requests and changes due to commiting of those code cell results.

Is there a way of preventing this?

Swallow answered 16/3, 2020 at 18:51 Comment(2)
nextjournal.com/schmudde/how-to-version-control-jupyterUntried
See github.com/github/gitignore/blob/master/Python.gitignore # Jupyter Notebook .ipynb_checkpointsRule
Y
8

I strongly recommend putting the following little script in .git/hooks/pre-commit. It uses nbconvert on all .ipynb files that are staged to be committed, and if after stripping all the output there's no changes to be committed it exits. That last part is important because otherwise you'll be making useless empty commits. Since it only runs on notebooks you have committed it won't remove all the output from other notebooks you're still working on.

#!/bin/bash
for f in $(git diff --name-only --cached); do
    if [[ $f == *.ipynb ]]; then
        jupyter nbconvert --clear-output --inplace $f
        git add $f
    fi
done

if git diff --name-only --cached --exit-code
then
    echo "No changes detected after removing notebook output"
    exit 1
fi

That script plus the appropriate .gitignore entries should ensure that your Git history is kept clear from unwanted Jupyter output.


Here's a Husky compatible variant, just save it in .husky/pre-commit.

#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"

for f in $(git diff --name-only --cached); do
    case "$f" in
        *".ipynb") jupyter nbconvert --clear-output --inplace $f && git add $f ;;
    esac
done

if git diff --name-only --cached --exit-code
then
    echo "No changes detected after removing notebook output"
    exit 1
fi
Yseulte answered 10/12, 2022 at 14:41 Comment(0)
N
3

You have a few options:

Jupytext (https://github.com/mwouts/jupytext), will let you open .py files as Jupyter notebooks, and since they do not store the input, the diff will be as easy as any other source code diff.

If you want to keep the .ipynb format, you can use nbdime (https://github.com/jupyter/nbdime) which produces nicer notebook diffs (you can integrate it with git diff).

Nurmi answered 17/3, 2020 at 0:6 Comment(0)
R
1

I suggest setting up a precommit hook to strip notebooks of rendered content and write back to file. And git ignoring .ipynb_checkpoints as @Werner suggests.

Rivero answered 11/1, 2022 at 14:43 Comment(0)
L
1

I changed Simon Hyll response so it does not change anything and just reject the commit if notebook cell outputs are not empty.

#!/bin/sh
for f in $(git diff --name-only --cached); do
  if [[ $f == *.ipynb ]]; then
    has_content="$(cat $f | underscore select '.cells' | underscore flatten --shallow | underscore any 'value?.outputs?.length > 0')"
    if $has_content
    then
        echo 'Notebook ' $f ' output cells are not clean!'
        echo 'Unstage ' $f ' file and clean its cell outputs'
        exit 1
    fi
  fi
done

more from this repo

Lucillelucina answered 19/7 at 18:25 Comment(0)
H
0

Since no one else has posted the trivial answer, I'll add it here.

The simplest way to prevent Jupyter outputs from committing is to clear the outputs before you commit. Clearly this has disadvantages, since you lose the outputs locally, but depending on your workflow it might be safer and more convenient than hard coding the other solutions.

To easily clear the Jupyter outputs before committing, you can click Clear All Outputs, which should be located at the top of your notebook.

Hoffer answered 24/7 at 19:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.