How to split an SVN folder into its own repository when it has been renamed?
Asked Answered
T

16

43

I want to split a directory from a large Subversion repository to a repository of its own, and keep the history of the files in that directory.

I tried the regular way of doing it first

svnadmin dump /path/to/repo > largerepo.dump
cat largerepo.dump | svndumpfilter include my/directory >mydir.dump

but that does not work, since the directory has been moved and copied over the years and files have been moved into and out of it to other parts of the repository. The result is a lot of these:

svndumpfilter: Invalid copy source path '/some/old/path'

Next thing I tried is to include those /some/old/path as they appear and after a long, long list of files and directories included, the svndumpfilter completes, BUT importing the resulting dump isn't producing the same files as the current directory has.

So, how do I properly split the directory from that repository while keeping the history?

EDIT: I specifically want trunk/myproj to be the trunk in a new repository PLUS have the new repository include none of the other old stuff, ie. there should not be possibility for anyone to update to old revision before the split and get/see the files.

The svndumpfilter solution I tried would achieve exactly that, sadly its not doable since the path/files have been moved around. The solution by ng isn't accetable since its basically a clone+removal of extras which keeps ALL the history, not just relevant myproj history.

Terence answered 11/1, 2009 at 17:38 Comment(0)
R
14

This problem occurs when one of the directories/files included by svndumpfilter originally was copied or moved from a section of the tree that is not being included.

To solve the problem use this script: svndumpfilter3

Routh answered 23/7, 2009 at 8:32 Comment(3)
That script got me past the described problem in creating the new dump file, but I see a different problem when I try to load the dump into a new repository. <<< Started new transaction, based on original revision 1868 svnadmin: File not found: transaction '1867-1fv', path 'dm/dm_trunk' * adding path : dm/dm_trunk ...Ten
I had (python-related?) problems with svndumpfilter3, but succeeded to filter a 13.000-revision repository, dump size 1.5G, with svndumpfilter2Roughhew
Thanks. The script works after a small modification. For some reason it tries to pass "srcpath" argument to "svnadmin dump" when untangling. That makes svnadmin fail. I had to remove that argument from the command, and it worked.Hammack
K
17

I had a similar problem splitting a repository ..

svndumpfilter: Invalid copy source path /dir/old_dir

What I did to get around the problem was to include the additional old directories that is was requesting, or that you know you moved. In my case I had moved 3 directories into another directory.

eg. Moved Folders A,B,C in to Folder D

cat project.dump | svndumpfilter include A B C D > new.dump

This seemed to solve my problem. I was able to separate Folder D from the rest of the Repo. On the flip-side, when excluding D I did not get the error, I would guess because removing D didn't require the links/history to A,B,C

Katerinekates answered 22/5, 2009 at 1:47 Comment(0)
R
14

This problem occurs when one of the directories/files included by svndumpfilter originally was copied or moved from a section of the tree that is not being included.

To solve the problem use this script: svndumpfilter3

Routh answered 23/7, 2009 at 8:32 Comment(3)
That script got me past the described problem in creating the new dump file, but I see a different problem when I try to load the dump into a new repository. <<< Started new transaction, based on original revision 1868 svnadmin: File not found: transaction '1867-1fv', path 'dm/dm_trunk' * adding path : dm/dm_trunk ...Ten
I had (python-related?) problems with svndumpfilter3, but succeeded to filter a 13.000-revision repository, dump size 1.5G, with svndumpfilter2Roughhew
Thanks. The script works after a small modification. For some reason it tries to pass "srcpath" argument to "svnadmin dump" when untangling. That makes svnadmin fail. I had to remove that argument from the command, and it worked.Hammack
O
6

I've tried at least 4 different applications to do that, the only that really worked was using svndumpfilterIN :

cd /usr/local/bin/
sudo wget --no-check-certificate https://raw.github.com/jasperlee108/svndumpfilterIN/master/svndumpfilter.py
sudo chmod +x svndumpfilter.py
# To be sure nothing will happened on the original repo :
cp -au /path/to/repo /tmp/largerepo.repo/
svnadmin dump /path/to/repo > /tmp/largerepo.dump
svndumpfilter.py /tmp/largerepo.dump --repo=/tmp/largerepo.repo --output-dump=/tmp/mydir.dump include my/directory

Here is what I tried and didn't worked :

Osbourn answered 5/3, 2014 at 10:8 Comment(1)
I really appreciate you creating this tool, I eventually got it to work but a few things are different than what you stated. 1) In order to get svnlook to work, I needed to actually provide --repo with a path to an actual uncompressed repo not a .dump file as your example shows. I was getting an error that myrepo.dump/format didn't exist. I had to do an svnadmin load on the .dump file and point --repo to that. 2) In --scan-only mode, the --output-dump option is still required otherwise you get an error.Dysentery
P
3

This could potentially help you: Quote from http://svnbook.red-bean.com/en/1.5/svn.reposadmin.maint.html#svn.reposadmin.maint.replication

In Subversion 1.5, svnsync grew the ability to also mirror a subset of a repository rather than the whole thing. The process of setting up and maintaining such a mirror is exactly the same as when mirroring a whole repository, except that instead of specifying the source repository's root URL when running svnsync init, you specify the URL of some subdirectory within that repository. Synchronization to that mirror will now copy only the bits that changed under that source repository subdirectory. There are some limitations to this support, though. First, you can't mirror multiple disjoint subdirectories of the source repository into a single mirror repository—you'd need to instead mirror some parent directory that is common to both. Second, the filtering logic is entirely path-based, so if the subdirectory you are mirroring was renamed at some point in the past, your mirror would contain only the revisions since the directory appeared at the URL you specified. And likewise, if the source subdirectory is renamed in the future, your synchronization processes will stop mirroring data at the point that the source URL you specified is no longer valid.

The Problem of course is losing the pre-rename history...

Photoengrave answered 21/1, 2009 at 16:46 Comment(0)
C
3

I encountered this problem and ended up using svndumpfilter2.

Specifically, this command:

sudo svnadmin dump /home/setup/svn/repos/main_repl | sudo ./svndumpfilter2.py /home/setup/svn/repos/main_repl Development QA compliance > ~/main_repl_dump.trim

I did get the out of memory error mentioned, however, since I was running svn on a VM, I just bumped the memory up to 2G. While I realize that this may not be an option for everyone, I noticed that it ran much faster than it had with 512M. (2G probably wasn't necessary).

Currently, it is processing revision 18,631.

In case anyone wonders, the reason why I needed to break out part of the repo was because we were creating tags/copies for distribution to implementation of files in another path of the repo. For some reason, this process was causing the repo to balloon to huge proportions. (We're at 17G now.)

I'm doing this on a replication repo of SVN, version 1.5.6, on Debian Lenny, 5.0.4.

Certainty answered 30/3, 2010 at 19:50 Comment(0)
T
3

I've just successfully migrated a project from an existing combined repo (at Google Code) to its own repo. The posts here were very helpful.

This is what finally worked for me...

  1. Used svnsync to make a local mirror of my Google Code repo following the directions here.
  2. svnadmin dump to foo-dumpfile
  3. cat dumpfile | ./svndumpfilter3 --untangle mymirrorrepo trunk/foo > foo-dumpfile
  4. svnadmin create foorepo
  5. svnadmin load foorepo --ignore-uuid < foo-dumpfile

The --untangle option in step 3 managed to resolve all of the path problems that stumped svndumpfilter and svndumpfilter2.

Initially, at step 5 I was stuck on the error:

<<< Started new transaction, based on original revision 2
svnadmin: File not found: transaction '1-1', path 'trunk/foo'

But this post in Charles Calvert's blog explained that all that was required was to create the trunk dir in foorepo before doing the load.

Theatricals answered 22/3, 2011 at 0:35 Comment(0)
F
2

Why not replicate the entire repository, dump it in to a new one. Then branch out the trunk, delete the head and merge the portions you want back in to the trunk from the branch. Then you have kept the history and split out the parts you want to a new repository.

  1. Dump to /trunk
  2. Branch /trunk to /branches/trunk
  3. Delete /trunk
  4. Merge /branches/trunk/whatever back in to /trunk or /trunk/whatever

This way you have kept all the history, and selectively picked the parts you want.

Fiction answered 11/1, 2009 at 17:53 Comment(3)
I can't seem to get it to work, can you add more specific commands to do that. It just skips the non-existent files, so I'm probably doing it wrong. Btw, how is that different from replicating the repo and deleteting everything else besides my dir? I also want to get rid of non-related history etc?Terence
There is no difference in just removing what you don't want. However, if you want the new repository /trunk to be the old repositories /trunk/whatever then you need to copy the full /trunk of the dump to /branches the copy back only what you want to /trunk, ill add another answer with specifics.Fiction
The problem is backup of the new repo will be even bigger then the original (if you do it that way).Susannesusceptibility
W
2

I'm also looking for an answer on this question (having to deal with it myself). Based on Alex' answer, I found http://furius.ca/pubcode/pub/conf/common/bin/svndumpfilter3.html which claims to fix some of the svndumpfilter2 issues. I believe it is a partial solution.

The good:

A rewrite of Subversion's svndumpfilter in pure Python, that allows you to untangle move/copy operations between excluded and included sets of files/dirs, by converting them into additions. If you use this option, it fetches the original files from a given repository.

Concern:

Important

Some people have been reporting a bug with this script, that it will create an empty file on a large repository. It worked great for the split that I had to do on my repository, but I have no time to fix the problem that occurs for some other people's repositories

Ween answered 22/3, 2010 at 13:41 Comment(0)
L
2

This is a wild and crazy stab in the over-complicating-things dark but what about importing the SVN repo into git using git-svn/[tailor][3], splitting off the directory using git-split, then exporting it back to svn with git-svn?

Libau answered 22/3, 2010 at 13:48 Comment(0)
F
1

The specific commands are as follows, I am going to assume the repository is hosted on a http(s):// server, although the same commands will work for svn:// or file://.

svnadmin dump /path/to/repository > dumpfile  
svnadmin create /path/to/new_repository 
svnadmin load /path/to/new_repository < dumpfile 
svn co https://localhost/svn/new_repository_url new_repository_checkout 
cd new_repository_checkout 
svn move https://localhost/svn/new_repository_url/trunk  https://localhost/svn/new_repository_url/branches/head -m "Moving HEAD to branches" 
svn move https://localhost/svn/new_repository_url/branches/head/whatever https://localhost/svn/new_repository_url/trunk -m "Creating new trunk" 
svn update 
cd branches 
svn remove head
svn commit

You should now have the part you want from the old repository as the trunk of the new one.

Fiction answered 11/1, 2009 at 19:56 Comment(1)
This is still the "keep history of everything" solution.. I need a solution that replicates the spirit of the svndumpfilter solution :/Terence
F
0

I see this is quite old now, but does adding "--skip-missing-merge-sources" help any? It seems like it might...

Fellowman answered 27/4, 2009 at 21:2 Comment(1)
Sorry, but no. I come up with either empty dump or with 'invalid copy source path' errors like before :(Terence
M
0

If you don't need the entire history you can pick it up from just after the error. If your error was at revision 412 then you can try picking it up right after with:

svnadmin dump /path/to/repo -r 413:HEAD > largerepo.dump

I realize this may not be a perfect solution either but it may be good enough in your case.

You may want to also just do this all in one step

svnadmin dump /path/to/repo -r 413:HEAD | svndumpfilter include my/directory > mydir.dump
Municipality answered 6/5, 2009 at 21:32 Comment(0)
A
0

Some more info about svndumpfilter and how to fix - http://blog.rlucas.net/uncategorized/some-gotchas-with-using-svndumpfilter/

Or you can try svndumpfilter replacement script, now called as svndumpfilter2 - http://cogo.wordpress.com/2009/03/10/problems-with-svndumpfilter/

I didn't tried that script, coz i need some time to make a repo backup, to test in on that (i have a backup dump to play with but on Windows, and it is a linux script).

Ate answered 21/1, 2010 at 15:1 Comment(2)
that new script really helped me, dump was as it should be.. no errors, no warning. SVNADMIN LAOD went ok, too. Our programmer told new repo is as it should be. So 5*Ate
Helped.. yes, to dump folder from dump with no error, and even to load it into empty repo. But beware: your new repo with this kind of dump is not ok. Some of your data is lost and can be a huge problem whil using build server (Hudson or Cruise Cntrol, for example). You'll probably have: Could not access revision times. [500, #0] [client 10.0.0.71] or Unable to deliver content. [409, #0] [client 10.0.0.229] So think twice and test it thee time, before going production.Ate
V
0

just ran into this problem and wrote a little script to retry dumping until all invalid source paths are resolved.

#!/usr/bin/env ruby

require 'open3'
include Open3

paths = [ "/your/path" ]
command = ""

new_path = "xx"
while (! new_path.nil?)
lines = nil
popen3(" svndumpfilter include #{paths.join(' ')} > svn.result.dump < svn.original.dump") do |i, o, err|
  i.close
  puts "Processing, please wait ..."
  lines = err.readlines
end

 new_path = nil
 lines.each do |line|
  if line =~ /Invalid copy source path '(.*)'/
    new_path = $1
  end
 end
 puts "Adding #{new_path}"
 paths << new_path
end
Variform answered 7/5, 2010 at 23:14 Comment(1)
Just a comment, the dumping was succesfull, but the reimport did not succeed. So, no luck there. (and switched to git last week with git svn clone)Variform
S
0

Based on the answer by ng., but with filtering and dropping empty revisions.

Step 1. Dump and filter:

svnadmin dump /path/to/repository > fulldumpfile
svndumpfilter include trunk/the/part/you/want --drop-empty-revs --renumber-revs < fulldumpfile > dumpfile

Step 2. Create new repo. (note that this can also be done e.g. with Tortoise SVN)

svnadmin create /path/to/new_repo

Remember to add whatever you need to be able to checkout (permissions and such).

Step 3. Checkout and add base folder (can also be done e.g. with Tortoise SVN)

svn checkout http://localhost/new_repo /some/checkout/path/newrepo
cd /some/checkout/path/newrepo
# to be able to create "trunk/the/part/you/want" you will need to add parent dir:
mkdir -p trunk/the/part/you
svn add trunk
svn commit -m "old base"

Step 4. Load filtered dump

svnadmin load /path/to/new_repo < dumpfile

Step 5. Move old root to new root (can also be done e.g. with Tortoise SVN)

cd /some/checkout/path/newrepo
svn update
svn move trunk/the/part/you/want/* trunk/
svn move tags/the/part/you/want/* tags/
svn move branches/the/part/you/want/* branches/
svn commit -m "re-structure base"

You should now have the part you want from the old repository as the trunk of the new one.

Susannesusceptibility answered 9/5, 2014 at 14:31 Comment(0)
S
0

We developed Subdivision, a GUI tool designed to split svn repositories.

Subdivision analyzes the repository and calculates the history of the files as they are being copied and moved throughout the repository. Using this information, your selections are intelligently augmented to avoid all "Invalid copy source path" errors.

In addition to splitting a repository, Subdivision can be used to delete files from a repository as well as extract files and folders into a new repository.

Subdivision is free for small repositories.

Stupa answered 3/2, 2016 at 22:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.