Using IPython / Jupyter Notebooks Under Version Control

Asked 11/9, 2013 at 7:5 Answered 10/10, 2023 at 16:56

version-control ipython jupyter-notebook

646

What is a good strategy for keeping IPython notebooks under version control?

The notebook format is quite amenable for version control: if one wants to version control the notebook and the outputs then this works quite well. The annoyance comes when one wants only to version control the input, excluding the cell outputs (aka. "build products") which can be large binary blobs, especially for movies and plots. In particular, I am trying to find a good workflow that:

allows me to choose between including or excluding output,
prevents me from accidentally committing output if I do not want it,
allows me to keep output in my local version,
allows me to see when I have changes in the inputs using my version control system (i.e. if I only version control the inputs but my local file has outputs, then I would like to be able to see if the inputs have changed (requiring a commit). Using the version control status command will always register a difference since the local file has outputs.)
allows me to update my working notebook (which contains the output) from an updated clean notebook. (update)

As mentioned, if I chose to include the outputs (which is desirable when using nbviewer for example), then everything is fine. The problem is when I do not want to version control the output. There are some tools and scripts for stripping the output of the notebook, but frequently I encounter the following issues:

I accidentally commit a version with the the output, thereby polluting my repository.
I clear output to use version control, but would really rather keep the output in my local copy (sometimes it takes a while to reproduce for example).
Some of the scripts that strip output change the format slightly compared to the Cell/All Output/Clear menu option, thereby creating unwanted noise in the diffs. This is resolved by some of the answers.
When pulling changes to a clean version of the file, I need to find some way of incorporating those changes in my working notebook without having to rerun everything. (update)

I have considered several options that I shall discuss below, but have yet to find a good comprehensive solution. A full solution might require some changes to IPython, or may rely on some simple external scripts. I currently use mercurial, but would like a solution that also works with git: an ideal solution would be version-control agnostic.

This issue has been discussed many times, but there is no definitive or clear solution from the user's perspective. The answer to this question should provide the definitive strategy. It is fine if it requires a recent (even development) version of IPython or an easily installed extension.

Update: I have been playing with my modified notebook version which optionally saves a .clean version with every save using Gregory Crosswhite's suggestions. This satisfies most of my constraints but leaves the following unresolved:

This is not yet a standard solution (requires a modification of the ipython source. Is there a way of achieving this behaviour with a simple extension? Needs some sort of on-save hook.
A problem I have with the current workflow is pulling changes. These will come in to the .clean file, and then need to be integrated somehow into my working version. (Of course, I can always re-execute the notebook, but this can be a pain, especially if some of the results depend on long calculations, parallel computations, etc.) I do not have a good idea about how to resolve this yet. Perhaps a workflow involving an extension like ipycache might work, but that seems a little too complicated.

Notes

Removing (stripping) Output

When the notebook is running, one can use the Cell/All Output/Clear menu option for removing the output.
There are some scripts for removing output, such as the script nbstripout.py which remove the output, but does not produce the same output as using the notebook interface. This was eventually included in the ipython/nbconvert repo, but this has been closed stating that the changes are now included in ipython/ipython,but the corresponding functionality seems not to have been included yet. (update) That being said, Gregory Crosswhite's solution shows that this is pretty easy to do, even without invoking ipython/nbconvert, so this approach is probably workable if it can be properly hooked in. (Attaching it to each version control system, however, does not seem like a good idea — this should somehow hook in to the notebook mechanism.)

Newsgroups

Thoughts on the notebook format for version control.

Issues

977: Notebook feature requests (Open).
1280: Clear-all on save option (Open). (Follows from this discussion.)
3295: autoexported notebooks: only export explicitly marked cells (Closed). Resolved by extension 11 Add writeandexecute magic (Merged).

Pull Requests

1621: clear In[] prompt numbers on "Clear All Output" (Merged). (See also 2519 (Merged).)
1563: clear_output improvements (Merged).
3065: diff-ability of notebooks (Closed).
3291: Add the option to skip output cells when saving. (Closed). This seems extremely relevant, however was closed with the suggestion to use a "clean/smudge" filter. A relevant question what can you use if you want to strip off output before running git diff? seems not to have been answered.
3312: WIP: Notebook save hooks (Closed).
3747: ipynb -> ipynb transformer (Closed). This is rebased in 4175.
4175: nbconvert: Jinjaless exporter base (Merged).
142: Use STDIN in nbstripout if no input is given (Open).

Eugene answered 11/9, 2013 at 7:5 Comment(19)

Sounds like a great thing to add as an issue on github.com/ipython/ipython or submit a pull request that helps you further this goal. – Livestock 12/9, 2013 at 21:25

@Kyle As you can see, there is already of plethora of PR's and issues relating to this goal. Once these are resolved (namely PR 4175), then a definitive answer should be available but will likely involve some additional scripting outside of IPython (git or hg hooks for example). Thus, I don't think there will be anything gained by adding a new PR or issue. – Eugene 13/9, 2013 at 8:52

Yeah, their development is moving fast and steadily every day. The devs are good folks though (and have probably read this posting). I know I want an easy workflow for working with git. – Livestock 13/9, 2013 at 13:23

@Kyle I did also mention this on the mailing list. It looks like PR 4175 will be resolved in a matter of hours/days so I expect this to move quickly. – Eugene 13/9, 2013 at 20:46

Once you have a working script for removing the output, you can use a Git "clean" filter to apply it automatically before committing (see clean/smudge filters). – Bring 19/9, 2013 at 12:3

All answers are contained in the question! @mforbes, it's fine to answer your own question, but better if you can put the answers in an answer. – Ress 19/10, 2013 at 16:42

@Ress The question contains unsatisfactory workarounds: each one has at least one limitation. Now that PR 4175 has been merged, a complete solution can probably be formulated, but this still needs to be done. As soon as I have some time, I will do it (as an answer) if someone else does not provide a satisfactory solution in the meantime. – Eugene 20/10, 2013 at 20:1

Fair enough. Looking forward to the solution, I'll probably use it. – Ress 21/10, 2013 at 2:58

Another partial solution: a filter for git that displays cleaner diffs, but still commits the actual notebooks whole and unmodified: gist.github.com/takluyver/bc8f3275c7d34abb68bf – Croaky 10/9, 2014 at 16:36

Very good question, but I don't see an accepted answer. Which answers did you try? Is there a recommended solution? – Quarto 4/11, 2014 at 20:56

@Quarto I have not yet found a recommended solution: I was going to go with the --script option, but that has been removed. I am waiting until post-save hooks are implemented (which are planned) at which point I think I will be able to provide an acceptable solution combining several of the techniques. – Eugene 5/11, 2014 at 22:7

It looks like IPython is getting close. Once PR 6896 is accepted, then we should be able to resolve this question through pre and post save hooks. – Eugene 7/12, 2014 at 5:32

@Eugene Looks like that PR was just merged a few days after your comment. Could you or someone more knowledgeable than me post an answer here that shows how to use the new feature? – Choong 17/12, 2014 at 14:10

@kobejohn I will eventually, but am a bit swamped right now. Maybe somebody else will beat me to it! – Eugene 19/12, 2014 at 8:26

@kobejohn: I just added an answer – Macintosh 11/3, 2015 at 15:28

Isn't the best solution a PR to github to just change the diff tool to special case notebook diffs and only show the diff of the input cells? Then you still get the output saved and rendered on GitHub, which is a big useful feature of notebooks. – Yuhas 3/2, 2019 at 20:27

Related to stackoverflow.com/questions/28908319. – Kilocycle 16/11, 2021 at 4:33

You can use our open-source framework - Ploomber (github.com/ploomber/ploomber) exactly for this task. It’s making your work with notebooks faster, helps you export it to raw python files and back to notebooks. That way you can develop production ready code. It's open sourced so most of the ideas in it came from the community and people trying to solve similar issues in the MLops domain. – Guaranty 26/2, 2022 at 4:5

Do I understand it right that you would wish to have notebooks where outputs do not necessarily correspond to input code? That sounds like a nighmare, if not dangerous. Can I ask why would somebody want that? – Leanto 12/5, 2022 at 12:4

139

Here is my solution with git. It allows you to just add and commit (and diff) as usual: those operations will not alter your working tree, and at the same time (re)running a notebook will not alter your git history.

Although this can probably be adapted to other VCSs, I know it doesn't satisfy your requirements (at least the VSC agnosticity). Still, it is perfect for me, and although it's nothing particularly brilliant, and many people probably already use it, I didn't find clear instructions about how to implement it by googling around. So it may be useful to other people.

Save a file with this content somewhere (for the following, let us assume ~/bin/ipynb_output_filter.py)
Make it executable (chmod +x ~/bin/ipynb_output_filter.py)
Create the file ~/.gitattributes, with the following content

*.ipynb filter=dropoutput_ipynb
Run the following commands:

git config --global core.attributesfile ~/.gitattributes git config --global filter.dropoutput_ipynb.clean ~/bin/ipynb_output_filter.py git config --global filter.dropoutput_ipynb.smudge cat

Done!

Limitations:

it works only with git
in git, if you are in branch somebranch and you do git checkout otherbranch; git checkout somebranch, you usually expect the working tree to be unchanged. Here instead you will have lost the output and cells numbering of notebooks whose source differs between the two branches.
more in general, the output is not versioned at all, as with Gregory's solution. In order to not just throw it away every time you do anything involving a checkout, the approach could be changed by storing it in separate files (but notice that at the time the above code is run, the commit id is not known!), and possibly versioning them (but notice this would require something more than a git commit notebook_file.ipynb, although it would at least keep git diff notebook_file.ipynb free from base64 garbage).
that said, incidentally if you do pull code (i.e. committed by someone else not using this approach) which contains some output, the output is checked out normally. Only the locally produced output is lost.

My solution reflects the fact that I personally don't like to keep generated stuff versioned - notice that doing merges involving the output is almost guaranteed to invalidate the output or your productivity or both.

EDIT:

if you do adopt the solution as I suggested it - that is, globally - you will have trouble in case for some git repo you want to version output. So if you want to disable the output filtering for a specific git repository, simply create inside it a file .git/info/attributes, with

**.ipynb filter=

as content. Clearly, in the same way it is possible to do the opposite: enable the filtering only for a specific repository.

the code is now maintained in its own git repo
if the instructions above result in ImportErrors, try adding "ipython" before the path of the script:
```
  git config --global filter.dropoutput_ipynb.clean ipython ~/bin/ipynb_output_filter.py
```

EDIT: May 2016 (updated February 2017): there are several alternatives to my script - for completeness, here is a list of those I know: nbstripout (other variants), nbstrip, jq.

Krysta answered 30/12, 2013 at 17:35 Comment(19)

How do you deal with the issue of incorporating changes that you pull? Do you just live with having to regenerate all of the output? (I think this is a manifestation of your second limitation.) – Eugene 1/1, 2014 at 20:14

I hope I clarified that now! – Krysta 3/1, 2014 at 9:1

There is a small new issue with my approach: if you make an edit, save the notebook, revert the edit and save again, the "signature" may have changed. But it is pointless to show it in the diffs. This, as well as the "output is lost" issue, can be solved by having hidden temporary files. It is a more complex approach, but I plan to implement it sooner or later. – Krysta 27/11, 2014 at 12:21

This solution works really well for my case. I just added if 'signature' in json_in.metadata: json_in.metadata['signature'] = "" to this script to strip the signature. – Bigamous 15/1, 2015 at 4:8

@Bigamous yep, that's the right thing to do. Keeping the signature doesn't make sense, since it also validates the output. I'm editing my answer accordingly, thanks. – Krysta 15/1, 2015 at 17:38

@PietroBattiston: Is it possible that in newer versions of IPython prompt_number was replaced with execution_count? – Macintosh 12/3, 2015 at 8:14

@Macintosh : absolutely! Edited, thanks. By the way, IPython.nbformat.current was also obsoleted and prints a warning, I will replace that too when I find out how. – Krysta 12/3, 2015 at 8:28

@PietroBattiston Check this out: gist.github.com/drorata/f3beb1ae736890b049f6 Feel free to comment there and discuss the matter. – Macintosh 12/3, 2015 at 9:0

I'm gettting a UserWarning: IPython.nbformat.current is deprecated since upgrading to Jupyter (ipython notebook 3.1.0). Is there an update for this? – Critique 23/4, 2015 at 16:13

@zhermes: this extended version should be OK – Krysta 24/4, 2015 at 8:56

@DaveP: (this was the same problem you also highlighted... my version is slightly more backward and forward compatible than yours) – Krysta 24/4, 2015 at 8:57

@zhermes: yes, I should have created a git since long. Done, finally. I don't have time now but next week will debug your error. – Krysta 28/4, 2015 at 6:49

Is there a way to use this git filters method with an external diff tool? The filter is applied if I use the normal command line tool but not if I'm using meld as a diff tool. https://mcmap.net/q/56600/-viewing-git-filters-output-when-using-meld-as-a-diff-tool/578770 – Smoothshaven 20/5, 2015 at 9:54

@revers No idea unfortunately... a workaround is to make a temporary commit (and compare HEAD rather than just the working dir) which you can then discard/amend. – Krysta 27/5, 2015 at 9:46

To avoid getting ImportError I had alter to the above to run using ipython: git config --global filter.dropoutput_ipynb.clean ipython ~/bin/ipynb_output_filter.py – Lippmann 12/9, 2015 at 17:55

@chris838: thanks. I'm honestly a bit confused by the recent changes in IPython, so I'm currently just reporting your suggestion at the end of the post, then when the migration Jupyter will be settled I will investigate a bit. – Krysta 13/9, 2015 at 19:7

Awsome solution Pietro, thanks :) I changed 2 things when using your script in my case: 1) I preferred declaring the filter in .gitattributes in the root of the repo as opposed to ~/.gitattributes, s.t. other people have the same filters as I do 2) I defined the regexp as workdir/**/*.ipynb filter=dropoutput_ipynb, and I put most my notebooks in workdir/ => if I still want to push a notebook with the output and enjoy the bookmarkable rendering in github, I just put it outside that folder. – Perverse 24/12, 2015 at 16:37

this solution is great, thanks :) is it possible to apply this filter backwards across the whole git history? and if so, how can I do that? – Bluepoint 16/2, 2017 at 15:34

Aha, good question. I guess something like this should work, replacing the rm... command with ipynb_output_filter.py (or better: a scripts which modifiles the notebooks in place)... but I have no idea of how well it plays with multiple branches – Krysta 16/2, 2017 at 21:52

We have a collaborative project where the product is Jupyter Notebooks, and we've use an approach for the last six months that is working great: we activate saving the .py files automatically and track both .ipynb files and the .py files.

That way if someone wants to view/download the latest notebook they can do that via github or nbviewer, and if someone wants to see how the the notebook code has changed, they can just look at the changes to the .py files.

For Jupyter notebook servers, this can be accomplished by adding the lines

import os
from subprocess import check_call

def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['jupyter', 'nbconvert', '--to', 'script', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save

to the jupyter_notebook_config.py file and restarting the notebook server.

If you aren't sure in which directory to find your jupyter_notebook_config.py file, you can type jupyter --config-dir, and if you don't find the file there, you can create it by typing jupyter notebook --generate-config.

For Ipython 3 notebook servers, this can be accomplished by adding the lines

import os
from subprocess import check_call

def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save

to the ipython_notebook_config.py file and restarting the notebook server. These lines are from a github issues answer @minrk provided and @dror includes them in his SO answer as well.

For Ipython 2 notebook servers, this can be accomplished by starting the server using:

ipython notebook --script

or by adding the line

c.FileNotebookManager.save_script = True

to the ipython_notebook_config.py file and restarting the notebook server.

If you aren't sure in which directory to find your ipython_notebook_config.py file, you can type ipython locate profile default, and if you don't find the file there, you can create it by typing ipython profile create.

Here's our project on github that is using this approach: and here's a github example of exploring recent changes to a notebook.

We've been very happy with this.

Panegyrize answered 10/9, 2014 at 12:13 Comment(11)

Thanks for the added evidence that using --script has worked in practice. The problem with this is that the actual notebooks might be huge if images are kept. An ideal solution along this way might use something like git-annex to keep track of only the latest full notebook. – Eugene 11/9, 2014 at 4:14

In Ipython 3.x the --script is deprecated. ipython.org/ipython-doc/3/whatsnew/version3.html – Macintosh 10/3, 2015 at 14:2

Thanks @dror, I've updated my answer to provide minrk's ipython 3.x solution as you also provided here. – Panegyrize 13/3, 2015 at 10:53

just in case anyone wondered, the '--to script' argument appears to be identical to the '--to python' argument that shows up in the link @Macintosh sent. i diff'd some output on the two conversions of a somewhat complex notebook and saw no differences – Generate 12/6, 2015 at 0:50

Here are full instructions for a post-save hook to save both .py and .html from IPython notebooks: protips.maxmasnick.com/… – Needy 20/7, 2015 at 15:45

Update: This solution is broken in iPython version 4, because of "The Big Split" of Jupyter from iPython. To adjust this solution to version 4, use the command jupyter notebook --generate-config to create a config file. The command jupyter --config-dir finds out which directory contains the config files. And the code snippet given by @Rich should be added to the file named jupyter_notebook_config.py. The rest works as before. – Drank 20/10, 2015 at 20:55

But I don't see in this answer what is done to the notebook files? Same as in favorite answer? – Fakieh 29/1, 2016 at 14:44

Would be great to have a separate menu item for Jupyter menu item: Save & commit. So it saves without output content and calls a script for (git) commit. Created github.com/jupyter/notebook/issues/1410 – Bogus 30/4, 2016 at 21:11

In addition to the point by @mobiusdumpling, replace the check_call(['ipython' with check_call(['jupyter', otherwise you will get a warning that ipython nbconvert is deprecated and you should use jupyter nbconvert instead. (Jupyter v4.1.0, iPython v4.1.2) – Monjo 14/7, 2016 at 15:53

If you want to save .py files to a different directory other than the current one, add '--output-dir', 'your_dir' to check_call. e.g., check_call(['jupyter', 'nbconvert', '--to', 'script', fname, '--output-dir', './src'], cwd=d) – Redhead 30/7, 2019 at 22:54

This also works for Jupyter Lab, you just need to edit jupyter_lab_config.py instead of jupyter_notebook_config.py – Felipa 27/10, 2021 at 15:55

I have created nbstripout, based on MinRKs gist, which supports both Git and Mercurial (thanks to mforbes). It is intended to be used either standalone on the command line or as a filter, which is easily (un)installed in the current repository via nbstripout install / nbstripout uninstall.

Get it from PyPI or simply

pip install nbstripout

Interact answered 27/2, 2016 at 13:32 Comment(4)

I am considering a workflow where I keep both .ipynb and corresponding .py automatically created using post-save hooks described above. I would like to use .py for diffs - would nbstripout be able to clear .py file from the cell execution counters (# In[1] changed to In[*]), so that they don't clutter the diffs or should I create a simple script for doing that? – Bernettabernette 22/12, 2017 at 12:9

@KrzysztofSłowiński No, nbstripout doesn't support this use case easily since it relies on the JSON format of the Notebook. You're likely better off writing a script specialized to your use case. – Interact 5/8, 2018 at 11:36

Does nbstripout have an option to work recursively on a given folder (I'm talking about the executable itself)? – Kilocycle 13/11, 2021 at 16:54

Not directly, and it doesn't need to. You can simply use find or some other standard way of recursively finding files you want to operate on. – Interact 13/11, 2021 at 17:33

Since there exist so many strategies and tools to handle version control for notebooks, I tried to create a flow diagram to pick a suitable strategy (created April 2019)

Parted answered 23/4, 2019 at 9:25 Comment(1)

As of March 2023, nbdime is probably the first option you want to check if you are using Github, not ReviewNB. See the GitHub blog post, "Feature Preview: Rich Jupyter Notebook Diffs". – Attraction 4/1 at 21:16

The very popular 2016 answers above are inconsistent hacks compared with the better way to do this in 2019.

Several options exist, the best that answers the question is Jupytext.

Jupytext

Catch the Towards Data Science article on Jupytext

The way it works with version control is you put both the .py and .ipynb files in version control. Look at the .py if you want the input diff, look at the .ipynb if you want the latest rendered output.

Notable mentions: VS studio, nbconvert, nbdime, hydrogen

I think with a little more work, VS studio and/or hydrogen (or similar) will become the dominant players in the solution to this workflow.

Yuhas answered 3/2, 2019 at 21:13 Comment(1)

This should be the top answer, jupytext is the way to go. – Lasonyalasorella 7/8, 2022 at 0:16

After a few years of removing outputs in notebooks, I have tried to come up with a better solution. I now use Jupytext, an extension for both Jupyter Notebook and Jupyter Lab that I have designed.

Jupytext can convert Jupyter notebooks to various text formats (Scripts, Markdown and R Markdown). And conversely. It also offers the option to pair a notebook to one of these formats, and to automatically synchronize the two representations of the notebook (an .ipynb and a .md/.py/.R file).

Let me explain how Jupytext answers the above questions:

allows me to choose between including or excluding output,

The .md/.py/.R file only contains the input cells. You should always track this file. Version the .ipynb file only if you want to track the outputs.

prevents me from accidentally committing output if I do not want it,

Add *.ipynb to .gitignore

allows me to keep output in my local version,

Outputs are preserved in the (local) .ipynb file

allows me to see when I have changes in the inputs using my version control system (i.e. if I only version control the inputs but my local file has outputs, then I would like to be able to see if the inputs have changed (requiring a commit). Using the version control status command will always register a difference since the local file has outputs.)

The diff on the .py/.R or .md file is what you are looking for

allows me to update my working notebook (which contains the output) from an updated clean notebook. (update)

Pull the latest revision of the .py/.R or .md file and refresh your notebook in Jupyter (Ctrl+R). You will get the latest input cells from the text file, with matching outputs from the .ipynb file. The kernel is not affected, which means that your local variables are preserved - you can continue you work where you left it.

What I love with Jupytext is that the notebook (under the form of a .py/.R or .md file) can be edited in your favorite IDE. With this approach, refactoring a notebook becomes easy. Once you're done you just need to refresh the notebook in Jupyter.

If you want to give it a try: install Jupytext with pip install jupytext and restart your Jupyter Notebook or Lab editor. Open the notebook that you want to version control, and pair it to a Markdown file (or a Script) using the Jupytext Menu in Jupyter notebook (or the Jupytext commands in Jupyter Lab). Save your notebook, and you'll get the two files: the original .ipynb, plus the promised text representation of the notebook, that is a perfect fit for version control!

For those who may be interested: Jupytext is also available on the command line.

Hurff answered 22/6, 2019 at 17:52 Comment(0)

Update: Now you can edit Jupyter Notebook files directly in Visual Studio Code. You can choose to edit the notebook or the converted python file.

I finally found a productive and simple way to make Jupyter and Git play nicely together. I'm still in the first steps, but I already think it is a lot better than all other convoluted solutions.

Visual Studio Code is a cool and open source code editor from Microsoft. It has an excellent Python extension that now allows you to import a Jupyter Notebook as python code. Now you also can directly edit Jupyter Notebooks.

After you import your notebook to a python file, all the code and markdown will be together in a ordinary python file, with special markers in comments. You can see in the image below:

Your python file just has the contents of the notebook input cells. The output will be generated in a split window. You have pure code in the notebook, it doesn't change while you just execute it. No mingled output with your code. No strange JSON incomprehensible format to analyze your diffs.

Just pure python code where you can easily identify every single diff.

I don't even need to version my .ipynb files anymore. I can put a *.ipynb line in .gitignore.

Need to generate a notebook to publish or share with someone? No problem, just click the export button in the interactive python window

If you are editing the notebook directly, there's now a icon Convert and save to a python script.

Here a screenshot of a notebook inside Visual Studio Code:

I've been using it just for a day, but finally I can happily use Jupyter with Git.

P.S.: VSCode code completion is a lot better than Jupyter.

Erida answered 21/11, 2018 at 0:48 Comment(2)

Do you know how this is exporting to pdf, the actual command it is using? When using vscode I can convert to a pdf and retain matplotlib plots. However, when using jupyterlab the resulting pdf doesn't keep any of the output. Ideally I'd like to use jupytext to produce pdfs with no code but with output – Gorrono 9/9, 2021 at 21:39

@bryce, I don't know. But take a look of pure Jupyter instead of JupyterLab. I think its export function works better. – Erida 10/9, 2021 at 13:50

(2017-02)

strategies

on_commit():
- strip the output > name.ipynb (nbstripout, )
- strip the output > name.clean.ipynb (nbstripout,)
- always nbconvert to python: name.ipynb.py (nbconvert)
- always convert to markdown: name.ipynb.md (nbconvert, ipymd)
vcs.configure():
- git difftool, mergetool: nbdiff and nbmerge from nbdime

tools

nbstripout: strip the outputs from a notebook
- src: https://gist.github.com/minrk/6176788
- src: https://github.com/kynan/nbstripout
  - pip install nbstripout; nbstripout install
ipynb_output_filter: strip the outputs from a notebook
- src: https://github.com/toobaz/ipynb_output_filter/blob/master/ipynb_output_filter.py
ipymd: convert between {Jupyter, Markdown, O'Reilly Atlas Markdown, OpenDocument, .py}
- src: https://github.com/rossant/ipymd
nbdime: "Tools for diffing and merging of Jupyter notebooks." (2015)
- src: https://github.com/jupyter/nbdime
- docs: http://nbdime.readthedocs.io/
  - nbdiff: compare notebooks in a terminal-friendly way
    - nbdime nbdiff works as a git diff tool: https://nbdime.readthedocs.io/en/latest/#git-integration-quickstart
  - nbmerge: three-way merge of notebooks with automatic conflict resolution
    - nbdime nbmerge works as a git merge tool
  - nbdiff-web: shows you a rich rendered diff of notebooks
  - nbmerge-web: gives you a web-based three-way merge tool for notebooks
  - nbshow: present a single notebook in a terminal-friendly way

Longsighted answered 9/2, 2017 at 4:40 Comment(0)

Here is a new solution from Cyrille Rossant for IPython 3.0, which persists to markdown files rather than json-based ipymd files:

https://github.com/rossant/ipymd

Ecclesiasticism answered 21/2, 2015 at 22:9 Comment(2)

Not supporting Jupyter yet, it seems. – Fakieh 29/1, 2016 at 14:47

I'm using ipymd successfully with the latest Jupyter -- do you get any specific problem or error message? – Beldam 1/2, 2016 at 18:12

Just come across "jupytext" which looks like a perfect solution. It generates a .py file from the notebook and then keeps both in sync. You can version control, diff and merge inputs via the .py file without losing the outputs. When you open the notebook it uses the .py for input cells and the .ipynb for output. And if you want to include the output in git then you can just add the ipynb.

https://github.com/mwouts/jupytext

Autocade answered 25/11, 2018 at 17:30 Comment(0)

As pointed out by, the --script is deprecated in 3.x. This approach can be used by applying a post-save-hook. In particular, add the following to ipython_notebook_config.py:

import os
from subprocess import check_call

def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save

The code is taken from #8009.

Macintosh answered 11/3, 2015 at 15:27 Comment(2)

Thanks for demonstrating the use of a post-save hook. Unfortunately, as mentioned elsewere, getting back from the .py file to a notebook is problematic, so this is unfortunately not a complete solution. (I kind of wish it was as it is very nice to diff .py files instead of notebooks. Perhaps the new notebook diff feature will be useful. – Eugene 12/3, 2015 at 0:15

Thanks! I am now using this trick to reproduce the --script behavior, regardless of version control. I had some problems at first, so just in case I can save someone some time: 1) If the ipython_notebook_config.py is missing from the profile folder, run ipython profile create to generate it. 2) If it seems as though the post-save-hook is ignored, run ipython with --debug to diagnose the problem. 3) If the script fails with error ImportError: No module named mistune - simple install minstue: pip install mistune. – Captor 17/3, 2015 at 12:17

Unfortunately, I do not know much about Mercurial, but I can give you a possible solution that works with Git, in the hopes that you might be able to translate my Git commands into their Mercurial equivalents.

For background, in Git the add command stores the changes that have been made to a file into a staging area. Once you have done this, any subsequent changes to the file are ignored by Git unless you tell it to stage them as well. Hence, the following script, which, for each of the given files, strips out all of the outputs and prompt_number sections, stages the stripped file, and then restores the original:

NOTE: If running this gets you an error message like ImportError: No module named IPython.nbformat, then use ipython to run the script instead of python.

from IPython.nbformat import current
import io
from os import remove, rename
from shutil import copyfile
from subprocess import Popen
from sys import argv

for filename in argv[1:]:
    # Backup the current file
    backup_filename = filename + ".backup"
    copyfile(filename,backup_filename)

    try:
        # Read in the notebook
        with io.open(filename,'r',encoding='utf-8') as f:
            notebook = current.reads(f.read(),format="ipynb")

        # Strip out all of the output and prompt_number sections
        for worksheet in notebook["worksheets"]:
            for cell in worksheet["cells"]:
               cell.outputs = []
               if "prompt_number" in cell:
                    del cell["prompt_number"]

        # Write the stripped file
        with io.open(filename, 'w', encoding='utf-8') as f:
            current.write(notebook,f,format='ipynb')

        # Run git add to stage the non-output changes
        print("git add",filename)
        Popen(["git","add",filename]).wait()

    finally:
        # Restore the original file;  remove is needed in case
        # we are running in windows.
        remove(filename)
        rename(backup_filename,filename)

Once the script has been run on the files whose changes you wanted to commit, just run git commit.

Brothers answered 4/11, 2013 at 4:27 Comment(1)

Thanks for the suggestion. Mercurial does not really have a staging area like git (though one could use mercurial queues for this purpose). In the meantime, I tried adding this code to a save hook that saves a clean version with a .clean extension. Unfortunately, I could not see how to do this without directly modifying IPython (although this change was quite trivial). I will play with this for a while and see if it suits all of my needs. – Eugene 7/11, 2013 at 17:40

I use a very pragmatic approach; which work well for several notebooks, at several sides. And it even enables me to 'transfer' notebooks around. It works both for Windows as Unix/MacOS.
Al thought it is simple, is solve the problems above...

Concept

Basically, do not track the .ipnyb-files, only the corresponding .py-files.
By starting the notebook-server with the --script option, that file is automatically created/saved when the notebook is saved.

Those .py-files do contain all input; non-code is saved into comments, as are the cell-borders. Those file can be read/imported ( and dragged) into the notebook-server to (re)create a notebook. Only the output is gone; until it is re-run.

Personally I use mercurial to version-track the .py files; and use the normal (command-line) commands to add, check-in (ect) for that. Most other (D)VCS will allow this to.

Its simple to track the history now; the .py are small, textual and simple to diff. Once and a while, we need a clone (just branch; start a 2nd notebook-sever there), or a older version (check-it out and import into a notebook-server), etc.

Tips & tricks

Add *.ipynb to '.hgignore', so Mercurial knows it can ignore those files
Create a (bash) script to start the server (with the --script option) and do version-track it
Saving a notebook does save the .py-file, but does not check it in.
- This is a drawback: One can forget that
- It's a feature also: It possible to save a notebook (and continue later) without clustering the repository-history.

Wishes

It would be nice to have a buttons for check-in/add/etc in the notebook Dashboard
A checkout to (by example) file@date+rev.py) should be helpful It would be to much work to add that; and maybe I will do so once. Until now, I just do that by hand.

Vulgarism answered 22/7, 2014 at 13:35 Comment(8)

How do you go from the .py file back to a notebook? I like this approach, but because .ipynb -> .py -> .ipynb is potentially lossy, I did not consider this seriously. – Eugene 22/7, 2014 at 21:31

That is easy: load it, by example by dropping it on de Notebook-dashboard. Except of "output data" nothing is lost – Vulgarism 25/7, 2014 at 13:34

If that is true, then I think this would be close to idea, but I seem to recall that IPython made no commitment to completely preserving data in the transition from .py to .ipynb formats. There is an issue about this – so perhaps this will form the basis for a complete solution. – Eugene 27/7, 2014 at 0:10

I am having some difficult converting from .py files to .ipynb files. nbconvert does not yet seem to support this, and I do not have a notebook dashboard since I run ipython notebook manually. Do you have any general suggestions about how to implement this backwards conversion? – Eugene 9/8, 2014 at 2:50

Surely the .py-to-notebook transformation is not intended to round-trip. So this can't really be a general solution though it's nice it works for you. – Torchbearer 14/8, 2014 at 17:40

This method now fails as --script was removed as an option. It should be reinstated when IPython 3.0 is released though. Still can't figure out a good way of getting the .py files into a notebook without writing a custom converter. – Eugene 21/9, 2014 at 1:37

I roundtrip .ipynb through Markdown (of all things!) for version control using Notedown. This gives me the option of stripping output before commits, but more importantly for my current use case (an online course), Markdown is much easier to refactor. Normally it's very hard to do reorganisation in a set of Notebooks: Moving multi-cell chunks around within a Notebook or between Notebooks, reordering, promoting or demoting sections, splitting or merging Notebooks, etc. All this is easy in Markdown (given a good text editor). – Shieh 16/12, 2014 at 19:55

@Eugene you can use my fancy script: github.com/petered/plato/blob/…, though yes, of course there are some things in the ipynb format that won't be conserved in the ipynb to py to ipynb journey – Dugout 18/2, 2015 at 15:0

I will also add to others suguested https://nbdev.fast.ai/ which is a state of the art "literate programming environment, as envisioned by Donald Knuth back in 1983!".

It also has some git hooks that help a little https://nbdev.fast.ai/#Avoiding-and-handling-git-conflicts and the other commands like:

nbdev_read_nbs
nbdev_clean_nbs
nbdev_diff_nbs
nbdev_test_nbs

So you can also create your documentation on the go as while writing a library for example some of them:

You can see a video here nbdev tutorial apart from the first link.

Anamorphosis answered 19/6, 2020 at 23:56 Comment(4)

I have not had a chance to look deeply, but this does not seem to support what Knuth calls "tangling", which is one of the major points of literate programming. This allows you to write the code in the order that makes sense for explanation, while retaining the appropriate order needed on disk. For example 14_callback.schedule.ipynb seems to start with the import statements - the least important part of the code. Tangling allows you to defer this until after the main concepts have been described. – Eugene 21/6, 2020 at 18:34

Well, not so sure if it does indeed handle tangling or not, but the "real" python file generated from that file is fastai2/callback/schedule.py, I added a youtube video I havent watched. – Anamorphosis 22/6, 2020 at 2:0

As of October 2022, nbdev2 has improved the workflow using git with notebooks, it basically solves problems with git conflicts: nbdev.fast.ai/tutorials/git_friendly_jupyter.html – Peracid 14/10, 2022 at 14:38

Here is a nice explanation: fast.ai/posts/2022-08-25-jupyter-git.html – Erida 15/12, 2022 at 0:7

I've built python package that solves this problem

https://github.com/brookisme/gitnb

It provides a CLI with a git-inspired syntax to track/update/diff notebooks inside your git repo.

Heres' an example

# add a notebook to be tracked
gitnb add SomeNotebook.ipynb

# check the changes before commiting
gitnb diff SomeNotebook.ipynb

# commit your changes (to your git repo)
gitnb commit -am "I fixed a bug"

Note that last step, where I'm using "gitnb commit" is committing to your git repo. Its essentially a wrapper for

# get the latest changes from your python notebooks
gitnb update

# commit your changes ** this time with the native git commit **
git commit -am "I fixed a bug"

There are several more methods, and can be configured so that it requires more or less user input at each stage, but thats the general idea.

Witching answered 2/6, 2017 at 15:21 Comment(0)

This is April-2020 and there are lots of strategies and tools for Jupyter notebook version control. Here's a quick overview of all the tools you can use,

nbdime - Nice for local diff'ing and merging of notebooks
nbstripout - A git filter to automatically remove notebook outputs before each commit
jupytext - Keeps a .py companion file sync'ed to each notebook. You only commit .py files
nbconvert - Convert notebooks to a python script or HTML (or both) and commit these alternate file types
ReviewNB - Shows notebook diff (along with output) for any commit or pull request on GitHub. One can also write comments on notebook cells to discuss changes (screenshot below).

Disclaimer: I built ReviewNB.

Langley answered 11/4, 2020 at 13:56 Comment(1)

With jupytext you can also use a system committing both notebook and .py code right? – Jhelum 27/7, 2023 at 16:19

To follow up on the excellent script by Pietro Battiston, if you get a Unicode parsing error like this:

Traceback (most recent call last):
  File "/Users/kwisatz/bin/ipynb_output_filter.py", line 33, in <module>
write(json_in, sys.stdout, NO_CONVERT)
  File "/Users/kwisatz/anaconda/lib/python2.7/site-packages/IPython/nbformat/__init__.py", line 161, in write
fp.write(s)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 11549: ordinal not in range(128)

You can add at the beginning of the script:

reload(sys)
sys.setdefaultencoding('utf8')

Excrescence answered 31/5, 2015 at 12:26 Comment(0)

After digging around, I finally found this relatively simple pre-save hook on the Jupyter docs. It strips the cell output data. You have to paste it into the jupyter_notebook_config.py file (see below for instructions).

def scrub_output_pre_save(model, **kwargs):
    """scrub output before saving notebooks"""
    # only run on notebooks
    if model['type'] != 'notebook':
        return
    # only run on nbformat v4
    if model['content']['nbformat'] != 4:
        return

    for cell in model['content']['cells']:
        if cell['cell_type'] != 'code':
            continue
        cell['outputs'] = []
        cell['execution_count'] = None
        # Added by binaryfunt:
        if 'collapsed' in cell['metadata']:
            cell['metadata'].pop('collapsed', 0)

c.FileContentsManager.pre_save_hook = scrub_output_pre_save

From Rich Signell's answer:

If you aren't sure in which directory to find your jupyter_notebook_config.py file, you can type jupyter --config-dir [into command prompt/terminal], and if you don't find the file there, you can create it by typing jupyter notebook --generate-config.

Cabbage answered 26/7, 2017 at 11:23 Comment(1)

I would note that this solution would never save any outputs to disk, and is somewhat independent of the version control issue. – Ethiopic 30/7, 2017 at 3:51

I did what Albert & Rich did - Don't version .ipynb files (as these can contain images, which gets messy). Instead, either always run ipython notebook --script or put c.FileNotebookManager.save_script = True in your config file, so that a (versionable) .py file is always created when you save your notebook.

To regenerate notebooks (after checking out a repo or switching a branch) I put the script py_file_to_notebooks.py in the directory where I store my notebooks.

Now, after checking out a repo, just run python py_file_to_notebooks.py to generate the ipynb files. After switching branch, you may have to run python py_file_to_notebooks.py -ov to overwrite the existing ipynb files.

Just to be on the safe side, it's good to also add *.ipynb to your .gitignore file.

Edit: I no longer do this because (A) you have to regenerate your notebooks from py files every time you checkout a branch and (B) there's other stuff like markdown in notebooks that you lose. I instead strip output from notebooks using a git filter. Discussion on how to do this is here.

Dugout answered 18/2, 2015 at 14:38 Comment(2)

I liked this idea, but after testing, found that the conversion from .py files back to .ipynb is problematic, especially with version 4 notebooks for which there is not yet a converter. One would currently need to use the v3 importer then convert to v4 and I am a bit concerned about this complicated trip. Also, a .py file is not a very good choice if the notebook is primarily Julia code! Finally, --script is deprecated so I think hooks are the way to go. – Eugene 18/2, 2015 at 20:55

The git filter solution in your link is good, you should copy your answer from there here :-) – Schulz 16/3, 2015 at 12:19

Ok, so it looks like the current best solution, as per a discussion here, is to make a git filter to automatically strip output from ipynb files on commit.

Here's what I did to get it working (copied from that discussion):

I modified cfriedline's nbstripout file slightly to give an informative error when you can't import the latest IPython: https://github.com/petered/plato/blob/fb2f4e252f50c79768920d0e47b870a8d799e92b/notebooks/config/strip_notebook_output And added it to my repo, lets say in ./relative/path/to/strip_notebook_output

Also added the file .gitattributes file to the root of the repo, containing:

*.ipynb filter=stripoutput

And created a setup_git_filters.sh containing

git config filter.stripoutput.clean "$(git rev-parse --show-toplevel)/relative/path/to/strip_notebook_output" 
git config filter.stripoutput.smudge cat
git config filter.stripoutput.required true

And ran source setup_git_filters.sh. The fancy $(git rev-parse...) thing is to find the local path of your repo on any (Unix) machine.

Dugout answered 16/3, 2015 at 14:5 Comment(0)

You can use this jupyter extension. It will enable you to directly upload your ipython notebooks to github.

https://github.com/sat28/githubcommit

I have also created a video demonstrating the steps - youtube link

Debbidebbie answered 17/1, 2018 at 11:2 Comment(2)

can you explain what this does? The doumentation is not especially clear. – Poynter 26/3, 2018 at 8:29

@AlexMonras This will directly add a button in jupyter notebook from where you can push notebooks to your GitHub repo with a commit message – Debbidebbie 16/5, 2018 at 3:41

How about the idea discussed in the post below, where the output of the notebook should be kept, with the argument that it might take a long time to generate it, and it is handy since GitHub can now render notebooks. There are auto-save hooks added for exporting .py file, used for diffs and .html for sharing with team members who do not use notebooks or git.

https://towardsdatascience.com/version-control-for-jupyter-notebook-3e6cef13392d

Emlyn answered 10/12, 2017 at 19:55 Comment(0)

Here's my solution: https://github.com/frankharkins/squeaky

It's a script that removes noise from the notebooks, and can be used as a pre-save hook, so that you never save the noise to disk. Also published to PyPI so everyone on your team (plus CI) can install it easily.

You can extend the instructions in the README to add a short function that strips outputs too, or I can add that as a feature if anyone requests it.

Ostmark answered 10/10, 2023 at 16:56 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++