REGEX for svndumptool
Asked Answered
J

3

4

I have a large (30+GB) legacy SVN repo with a lot of externals defined that needs to be cloned to a new server. As the repo was originally created in the pre SVN v1.5 days it has a lot of externals defined with absolute paths that refer back to the old server name. I want to remove all the absolute paths and make them relative so that the migration will work.

I found svndumptool via this question, it works great on some of the externals but I haven't been able to figure out a REGEX that will work for the rest of the cases.

Here are cases of the six different types of external definitions that I found in the repo by running the command: svn propget --recursive svn:externals %REPODIR_FILE%/%REPO%

CaseA https://svn.acme.com/svn/test/branches/project.x
CaseB -r 19 https://svn.acme.com/svn/test/branches/project.y
https://svn.acme.com/svn/test/branches/project.z CaseC
-r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD
CaseE  https://svn.acme.com/svn/test/branches/project.x CaseE
CaseF -r21  https://svn.acme.com/svn/test/branches/project.y

Note that CaseE is the same as CaseA except for the double spacing before the https.

Note that CaseF is almost the same as CaseB except for the space between the -r and the tag number and the double spacing before the https.

I'm using rubular.com to test my REGEX, currently I'm using the following expression:

^(\S+) (|-r ?\d* ?)https:\/\/svn.acme.com(\S+)

Which gives me:

Match 1
1.  CaseA
2.   
3.  /svn/test/branches/project.x
Match 2
1.  CaseB
2.  -r 19
3.  /svn/test/branches/project.y

I haven't been able to come up with a REGEX that would parse cases C and D into something like the following:

Match 3
1.  /svn/test/branches/project.z
2.  
3.  CaseC
Match 4
1.  -r 20
2.  /svn/test/branches/project.z@20
3.  CaseD

svndumptool does seem to require that I split out the different components of the external definition so that it can correctly reassemble it in the correct (SVN v1.5) syntax.

Any help from the REGEX gods would be much appreciated :-)

Jumpy answered 22/1, 2014 at 19:55 Comment(2)
What CaseA, CaseB etc are ?Kessiah
The different cases of string type I found in the SVN dump file. Actually I found 2 more since, I'll edit my question to reflect that.Jumpy
J
4

Here is the set of commands that I have found work for me, hopefully this helps someone trying to fix a borked SVN repo in the future. Remember friends don't let friends use absolute externals!

This procedure reduced the list of externals from over 30K defined externals to just 30 defined externals in the first six iterations.

:: List of types of externals we need to deal with
CaseA https://svn.acme.com/svn/test/branches/project.x
CaseB -r 19 https://svn.acme.com/svn/test/branches/project.y
https://svn.acme.com/svn/test/branches/project.z CaseC
-r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD
CaseE  https://svn.acme.com/svn/test/branches/project.x
CaseF -r21  https://svn.acme.com/svn/test/branches/project.y

:: SVN Dump Tool
SET SVNDUMPTOOL=C:\support\svndumptool\v0.6.1\svndumptool.exe
SET REPODIR=D:\Repositories
SET REPODIR_FILE=file:///D:/Repositories
SET DUMPDIR=D:\Dumps
SET REPO=test
SET SVN="C:\Program Files (x86)\VisualSVN Server\bin\svn.exe"
SET SVNADMIN="C:\Program Files (x86)\VisualSVN Server\bin\svnadmin.exe"
SET CREATE=%SVNADMIN% create
SET LOAD=%SVNADMIN% load --ignore-uuid
SET DUMP=%SVNADMIN% dump

:: Get a list of the externals in the original repo
svn propget --recursive svn:externals %REPODIR_FILE%/%REPO%>%DUMPDIR%\%REPO%.externals

:: Dump the repo
%DUMP% %REPODIR%\%REPO% > %DUMPDIR%\%REPO%.dump

:: Transform the repo
:: CaseA
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+) https://svn.acme.com(\S+)" "\2 \1" %DUMPDIR%\%REPO%.dump %DUMPDIR%\%REPO%_A.dump
:: Delete the dump to save disk space, each dump file iteration is ~300GB
DEL %DUMPDIR%\%REPO%.dump
:: CaseB
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+) (-r ?\d* ?)https://svn.acme.com(\S+)" "\2\3 \1" %DUMPDIR%\%REPO%_A.dump %DUMPDIR%\%REPO%_AB.dump
DEL %DUMPDIR%\%REPO%_A.dump
:: CaseC
%SVNDUMPTOOL% transform-prop svn:externals "^(\S*)https://svn.acme.com(\S*)" "\2\1" %DUMPDIR%\%REPO%_AB.dump %DUMPDIR%\%REPO%_ABC.dump
DEL %DUMPDIR%\%REPO%_AB.dump
:: CaseD
%SVNDUMPTOOL% transform-prop svn:externals "^(-r ?\d* ?)(\S+) https://svn.acme.com(\S+)" "\1\2 \3" %DUMPDIR%\%REPO%_ABC.dump %DUMPDIR%\%REPO%_ABCD.dump
DEL %DUMPDIR%\%REPO%_ABC.dump
:: CaseE
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+)  https://svn.acme.com(\S+)" "\2 \1" %DUMPDIR%\%REPO%_ABCD.dump %DUMPDIR%\%REPO%_ABCDE.dump
DEL %DUMPDIR%\%REPO%_ABCD.dump
:: CaseF
%SVNDUMPTOOL% transform-prop svn:externals "^(\S+) (-r ?\d* ?)  https://svn.acme.com(\S+)" "\2\3 \1" %DUMPDIR%\%REPO%_ABCDE.dump %DUMPDIR%\%REPO%_ABCDEF.dump
DEL %DUMPDIR%\%REPO%_ABCDE.dump

:: Delete the old repo
RMDIR /Q /S %REPODIR%\%REPO%
:: Create a new clean repo
%CREATE% %REPODIR%\%REPO%
:: Load the fixed dump
%LOAD% %REPODIR%\%REPO% < %DUMPDIR%\%REPO%_ABCDEF.dump
:: Get the new list of externals
%SVN% propget --recursive svn:externals %REPODIR_FILE%/%REPO%>%DUMPDIR%\%REPO%_ABCDEF.externals
Jumpy answered 12/2, 2014 at 17:44 Comment(2)
Note that this basic procedure is valid for Linux based SVN repos as well, you'll just need to use the appropriate bash commands instead. SVNdumptool is a Python script so it will work on any platform.Jumpy
In case you want to make all non-retro (1.5+) externals using relative paths (migrating a 1.5+ repo to a new server), use this Regex: "^(|-r ?\d* ?)http://server/repo(/\S*) (\S*)" "\1^\2 \3". It replaces externals in the form -r511 http://server/repo/something CaseG with -r511 ^/something CaseGBiodegradable
K
1

Here are two choices since you're using Ruby. However, do you have any other regular expression flavor on your machine ?

1st Choice (Absolute path AND 3 matches)

^(-r ?\d*|(?:https:\/\/svn.acme.com)?\S+|\S+) (-r ?\d*|\S+)(?: (\S+))?$

Demo

http://rubular.com/r/dBMVd1arVJ


2nd Choice (Relative path AND multiple matches)

^(\S+) (?:https:\/\/svn\.acme\.com)(.+)|(\S+) (-r ?\d+) (?:https:\/\/svn\.acme\.com)(.+)|(?:(-r ?\d+) )?(?:https:\/\/svn\.acme\.com)(.+) (\S+)

Demo

http://rubular.com/r/f3t3OH5Wqn

Kessiah answered 24/1, 2014 at 23:57 Comment(1)
Your REGEX's do indeed work quite nicely but unfortunately the tool I'm using (svndumptool) didn't like them. After much wailing and gnashing of teeth (and time with a good REGEX book) I finally came up with a set of REGEX's that work for this purposes. I'll write up an answer shortly.Jumpy
S
0

In case someone using Python ends up in here:

import re

test_externals ="""
CaseA https://svn.acme.com/svn/test/branches/project.x
CaseB -r 19 https://svn.acme.com/svn/test/branches/project.y
https://svn.acme.com/svn/test/branches/proje_9ct.z/123 CaseC1
https://svn.acme.com/svn/test/branches/proje_9ct.z/123   CaseC2
https://svn.acme.com/svn/test/branches/proje_9ct.z/123    CaseC3
https://svn.acme.com/svn/test/branches/project.zCaseC4
-r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD1
-r27 https://svn.acme.com/svn/test/branches/project.z@27 CaseD2
-r37 https://svn.acme.com/svn/test/branches/project.z CaseD3
https://svn.acme.com/svn/test/branches/project.z@88 CaseD4
 -r 20 https://svn.acme.com/svn/test/branches/project.z@20 CaseD1
CaseE -r21  https://svn.acme.com/svn/test/branches/project.y
"""

pat_url    = r'(?P<url>https?://(?:[a-zA-Z0-9\._-]+)(?:[a-zA-Z0-9\._-/]+))'
pat_folder = r'(?P<folder>[a-zA-Z0-9/\.-_]+)'
pat_pegrev = r'(?:@(?P<peg_revision>\d+))'
pat_oprev  = r'(?:-r\s?(?P<op_rev>\d+))'

regex_externals = {
    'CaseA': re.compile(r'^\s*{folder}\s{url}$'.format(folder=pat_folder, url=pat_url)),
    'CaseB': re.compile(r'^\s*{folder}\s{oprev}\s{url}$'.format(folder=pat_folder, oprev=pat_oprev, url=pat_url)),
    'CaseC': re.compile(r'^\s*{url}\s{folder}$'.format(folder=pat_folder, url=pat_url)),
    'CaseD': re.compile(r'^\s*{oprev}?\s{url}{pegrev}?\s*{folder}$'.format(folder=pat_folder, oprev=pat_oprev, pegrev=pat_pegrev, url=pat_url)),
}

for r in regex_externals: print('%s: %s' %(r, regex_externals[r].pattern))


for case in test_externals.split('\n'):
for pat in regex_externals:
    match = re.search(regex_externals[pat], case)
    if match:
        print('\n\n%s: %s' %(pat, case))
        for g in match.groups():
            print '\t%s' % g
Scarabaeus answered 21/1, 2015 at 14:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.