How exactly does subversion store files in the repository?
Asked Answered
G

5

55

I read the subversion book and it is clear to me that subversion does not store individual files but only deltas in order to minimize disk space. Subversion also does the same with binary files as well (this used to be a huge weakness of CVS).

However I do not understand the exact mechanism. When I commit a file what happens?

  1. Subversion stores only the diff (and already has the old version)
  2. Subversion deletes the previous version, stores the new file intact and creates a reverse diff in order to "re-create" the old version if needed.
  3. Something else that I haven't thought of.

The first case might seem the most logical. This however raises another question. If I have in a subversion repository a file with 1000 commits and a new developer checks out a clean copy, then subversion would have to fetch the original version (initial import) and apply 1000 diffs on this before returning the result. Is this correct? Is there some sort of caching for files where the latest version is kept as well?

Basically where can I find information on the svn repository internals?

Update: Apparently the backend of subversion plays a big role in this. At the time or writing FSFS uses option 1 while BDB uses option 2. Thanks msemack!

Giess answered 25/2, 2010 at 9:16 Comment(3)
Minor correction: "Later versions of subversion also do the same with binary files as well". Subversion has ALWAYS done this (at least as far back as version 0.3.x).Aposiopesis
possible duplicate of SVN performance after many revisionsNag
I also find this very confusing. There are several back ends available, bdb and fsfs. Then there are documents about "bubble up", skip-delta which seem to contradict each other. What is the current way a default svn repo stores its files?Valetudinary
R
20

Because Subversion's repository format is entirely internal, they are free to change the representation from one revision to the next. I believe the current revision generally stores reverse deltas (your option 2), but also stores complete snapshots periodically so it doesn't have to resolve 1000 diffs before returning a result.

The Subversion 1.6 release notes has a section on Filesystem storage improvements that has some notes on this, and links to other sources. Suffice to say that the details of Subversion data storage are complex and subject to change.

There is also a design document in the Subversion source tree that describes the use of skip deltas in Subversion. Generally, the /notes/ directory contains several useful documents regarding Subversion internals.

Rudelson answered 25/2, 2010 at 9:21 Comment(2)
If you still use svn, has anything significant changed with regards to above answer in the last 12 years? Mainly wondering regarding the "subject to change", if means anything major did change, or the current implementation is basically what existed 12 years ago.Messenger
I can answer that. They do change it regularly. I have heard they are changing completely from FSFS to a new system soon. Previous updates to FSFS have included LZW compression or allowed file packing as an option. Here is a list of Subversion versions and their default FSFS formats - serverfault.com/a/277591/122440 . Also, I'll note that each version of Subversion can use previous FSFS formats if set when the repository is created, and a new Subversion can open old repositories with backwards compatibility.Olnton
A
12

From the Subversion Design document (which is quite dated, though) you can get this:

Like many other revision control systems, Subversion stores changes as differences. It doesn't make complete copies of nodes; instead, it stores the latest revision as a full text, and previous revisions as a succession of reverse diffs (the word "diff" is used loosely here – for files, it means vdeltas, for directories, it means a format that expresses changes to directories).

I don't think that was changed since.

Also, see Bubble-Up Method.

Amboina answered 25/2, 2010 at 9:24 Comment(1)
The design document you linked to refers to the old Berkeley DB database formate, which SVN no longer uses as the default.Aposiopesis
P
10

I believe the following link would be of assistance to understand the fsfs architecture

http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure

Punish answered 6/10, 2011 at 13:33 Comment(0)
B
4

the regular FSFS specification might help you.

Or if you use Berkeley DB, here's the specification for that.

FSFS uses reverse deltas to store the changes and skip-deltas to speed up some actions, if I understood everything correctly.

Broeker answered 25/2, 2010 at 9:47 Comment(0)
S
2

Each time you commit a change, the repository stores a new revision of that overall repository tree, and labels the new tree with a new revision number. Of course, most of the tree is the same as the revision before, except for the parts you changed.

The new revision number is a sequential label that applies to the entire new tree, not just to the files and directories you touched in that revision. However, colloquially, a revision number is used to refer to the change committed in that revision; for example, "the change in r588" ("r588" is shorthand for "revision 588") really means "the difference between repository trees 587 and 588", or put another way, "the change made to tree 587 to produce tree 588".

Have a look at : Subversion FAQ

Sanious answered 25/2, 2010 at 9:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.