SVN performance after many revisions
Asked Answered
M

9

49

My project is currently using a svn repository which gains several hundred new revisions per day. The repository resides on a Win2k3-server and is served through Apache/mod_dav_svn.

I now fear that over time the performance will degrade due to too many revisions.
Is this fear reasonable?
We are already planning to upgrade to 1.5, so having thousands of files in one directory will not be a problem in the long term.

Subversion on stores the delta (differences), between 2 revisions, so this helps saving a LOT of space, specially if you only commit code (text) and no binaries (images and docs).

Does that mean that in order to check out the revision 10 of the file foo.baz, svn will take revision 1 and then apply the deltas 2-10?

Mohamedmohammad answered 24/9, 2008 at 15:0 Comment(0)
K
60

What type of repo do you have? FSFS or BDB?

(Let's assume FSFS for now, since that's the default.)

In the case of FSFS, each revision is stored as a diff against the previous. So, you would think that yes, after many revisions, it would be very slow.

However, this isn't the case. FSFS uses what are called "skip deltas" to avoid having to do too many lookups on previous revs.

(So, if you are using an FSFS repo, Brad Wilson's answer is wrong.)

In the case of a BDB repo, the HEAD (latest) revision is full-text, but the earlier revisions are built as a series of diffs against the head. This means the previous revs have to be re-calculated after each commit.

For more info: http://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas

P.S. Our repo is about 20GB, with about 35,000 revisions, and we have not noticed any performance degradation.

Kolinsky answered 25/9, 2008 at 1:54 Comment(4)
In your repo of 20GB, is it stored as FSFS or BDB?Educational
It's FSFS (at least it is now). For the 1st year or so of our repo's lifespan it was BDB (FSFS didn't exist yet). As some point we did a dump/load cycle to convert to FSFS. We weren't having any specific problems with BDB, but FSFS seems architecturally better (hence FSFS is now the default).Kolinsky
That's an interesting piece of information. I have a repository with 73000 files (roughly 350 MB) and it's unbelievable slow. I have to inquire what they are using.Sosthenna
As a side-note, the PHP repository is stored on Subversion with (at time of writing) 295,197 revisions. svn.php.net/repository/php/php-src/trunkFountain
T
15

Subversion stores the most current version as full text, with backward-looking diffs. This means that updates to head are always fast, and what you incrementally pay for is looking farther and farther back in history.

Timi answered 24/9, 2008 at 15:14 Comment(2)
Subversion uses forward-looking deltas.Plenum
According to an answer here, you're both right: "Subversion uses forward deltas in FSFS repositories and backward deltas in BDB Repositories" #8825097Foetation
D
5

I personally haven't dealt with Subversion repositories with codebases bigger than 80K LOC for the actual project. The biggest repository I've actually had was about 1.2 gigs, but this included all of the libraries and utilities that the project uses.

I don't think the day to day usage will be affected that much, but anything that needs to look through the different revisions might slow down a tad. It may not even be noticeable.

Now, from a sys admin point of view, there are a few things that can help you minimize performance bottlenecks. Since Subversion is mostly a file-based system, you can do this:

  • Put the actual repositories in a different drive
  • Make sure that no file locking apps, other than svn, are working on the drive above
  • Make the drives at least 7,500 RPM. You could try getting 10,000 RPM, but it may be overkill
  • Update the LAN to gigabit, if everybody is in the same office.

This may be overkill for your situation, but that's what I've usually done for other file-intensive applications.

If you ever "outgrow" Subversion, then Perforce will be your next step up. It's hands down the fastest source control app for very large projects.

Declarative answered 24/9, 2008 at 15:17 Comment(0)
D
4

We're running a subversion server with gigabytes worth of code and binaries, and it's up to over twenty thousand revisions. No slowdowns yet.

Daisie answered 24/9, 2008 at 15:20 Comment(0)
D
3

Subversion only stores the delta (differences), between 2 revisions, so this helps saving a LOT of space, specially if you only commit code (text) and no binaries (images and docs).

Additionally I´ve seen a lot of very big projects using svn and never complained about performance.

Maybe you are worried about checkout times? then I guess this would really be a networking problem.

Oh, and I´ve worked on CVS repositories with 2Gb+ of stuff (code, imgs, docs) and never had an performance problem. Since svn is a great improvement on cvs I don´t think you should worry about.

Hope it helps easy your mind a little ;)

Diligence answered 24/9, 2008 at 15:4 Comment(0)
B
3

I do not think that our subversion slowed down by aging. We have currently several TeraBytes of data, mostly binary. We checkout/commit daily up to 50 GigaByte of data. In total we have currently 50000 revisions. We are using FSFS as storage type and are interfacing either directly SVN: (Windows server) or via Apache mod_dav_svn (Gentoo Linux Server).

I cannot confirm that this gets svn to slowdown over time, as we set up a clean server for performance comparison which we could compare to. We could NOT measure a significant degration.

However I have to say that our subversion is uncommonly slow by default and obviously it is subversion itself as we tried with another computer system.

For some unknown reasons subversion seems to be completly server CPU limited. Our checkout/commit rates are limited to in between 15-30 MegaBytes/s per client because then one server CPU core is completly used up. This is the same for an almost empty repository (1 GigaByte, 5 revisions) as for our full server (~5 TeraByte, 50000 revisions). Tuning like setting compression to 0 = off did not improve this.

Our High Bandwith (delivers ~1 GigaByte/s) FC-Array idles, the other cores idle and network (currently 1 GigaBit/s for clients, 10 GigaBits/s for server) idles as well. Okay not really idling but if only 2-3% of available capacity is used I call it idling.

It is no real fun to see all components idling and we need to wait for our working copies to get checked out or comitted. Basically I have no idea what the server process is doing by fully consuming one CPU core all the time during checkout/commit.

However I am just trying to find a way to tune subversion. If this is not possible we might need to switch to another system.

Therefore: Answer: No SVN does not degrade in performance it is initially slow.

Of course if you do not need (high) performance you won't have a problem. Btw. all the above applies to subversioon 1.7 latest stable version

Byronbyrum answered 18/11, 2013 at 16:54 Comment(1)
"We have currently several TeraBytes of data, mostly binary. We checkout/commit daily up to 50 GigaByte of data. In total we have currently 50000 revisions". That's incredible! Since you wrote this in 2013, did you see any improvement in the CPU consumption issue you mentioned by moving to the newer versions of Subversion (if you migrated; might be hell migrating such a huge repo)?Winniewinnifred
H
2

The only operations which are likely to slow down are things which read information from multiple revisions (e.g. SVN Blame).

Hercegovina answered 24/9, 2008 at 15:10 Comment(0)
C
-1

I am not sure..... I am using SVN with apache on Centos 5.2. Works ok. Revision number was 8230 something like that... And on all client machines Commit was so slow that we had to wait at least 2min for a file that is 1kb. I am talking about 1 file that has no big filesize.

Then I made a new repository. Started from rev. 1. Now works ok. Fast. used svnadmin create xxxxxx. did not check if it is FSFS or BDB.....

Calceolaria answered 17/4, 2009 at 18:52 Comment(0)
P
-2

Maybe you should consider improving your workflow.

I don't know if a repos will have perf issues in these conditions, but you ability to go back to a sane revision will.

In your case, you may want to include a validation process, so a team commit in a team leader repo, and each of them commit to the team manager repo who commit to the read-only clean company repos. You have make a clean selection at it stage of what commit must go to the top.

This way, anybody can go back to a clean copy, with an easy to browse history. Merge are much easier, and dev can still commit their mess as much as they want.

Phares answered 26/2, 2010 at 9:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.