What was the purpose of the original dump? Is there a problem with the old repo?
What exactly did you do when you said "extract" and "import" into the new repo? Was this some revision specific dump and then a load? Or was it a checkout of some revision, copy the bits into a working copy for the new repo and a commit? Or?
Thanks for your response Doug.
-Update to subversion 1.8
-Copy the repository to a new server
-Verify the repository is in good health
-Upgrade to the latest format version
Problem with old repository
-The repository will not validate (numerous errors)
-I tried to dump it and ran out of diskspace (1.3TB used before crash)
-Then tried to dump small ranges of revisions and I run out of diskspace and i have no idea how much is needed or how long it will take
I meant exporting the repository out to a local folder
Importing the local folder into the new repository in subversion
I did not do a checkout. I just wanted to pull the latest from the old repository to a local folder then import into the new repository. I have completed this step with no problems. Now the question is, if I were able to dump using revision ranges and using the --incremental could I then load those into the new repository thus keeping at least some of my history? And if so would/could it cause problems with revision conflicts in the future in the new repository ? Would it validate successfully ?
In terms of the "svnadmin dump", consider re-directing the output to something other than a file. For instance "svnadmin dump | bzip2 -9 - > repo_dumped.bz2".
Or, to do it all in 1 fell swoop:
(svnadmin dump 2> dump.errs) | (ssh otherbox svnadmin load --force-uuid /path/to/new/empty/repo 2> load.errs > load.out)
Of course, if you've got to do dumping in chunks (e.g. using "--revision Start:Finish") in order to avoid dump failures then this could take a while.
In my experience, using the --deltas and --incremental options to dump will greatly reduce the output BUT dump is then far more susceptible to failures.
In general, loading the "latest" revision 1st and then starting over at revision 1 is not a good process for preserving history.
A few things occurred to me:
1. You could load into a sub-directory created just for the history. But then all of the paths will be wrong.
2. You probably want to specify a large memory cache size on the load to speed things up. The default is quite small.
This is the first error i get. Where do i go from here to resolve?
E:\>svnadmin verify E:\csvn\data\repositories\xxxxx
* Verified revision 0.
svnadmin: E160004: Filesystem is corrupt
svnadmin: E200014: Checksum mismatch while reading representation:
It appears to not like Revision 1 ?! Do you have a backup before the corruption?
Sometimes a dump/load can cure such issues (or at least get past them). Given that error I will re-state that you should not use the "--deltas" or "--incremental" options for dump.
What happens with the dump/load I proposed above? Does it blow up?
this was my first attempt before posting my issue in the forum
svnadmin dump %%i | %SEVENZIP% ..\_svndump\%%i.dump.7z
The statement ran for about 10 hours and eventually used up 1.3TB before crashing the server. The foldersize of the repo is 470GB. This is when i ran validate and saw the errors mentioned above.
Well, if this wasn't Windows then that would be good news. The pipe solution on linux does not end up storing any data - it all moves through limited kernel buffers. Not sure if Windows is still doing what it used to do (collect up all of the output before putting it on the input of the 2nd consumer) but it's worth a try to see if they've fixed it...
If that fails, then with the repo at 470GB it is likely storing binaries. The dump is going to put each one out on its own. I've seen dumps take 10X the size of the repo. Anyway you can put an array together create a share with more than 4TB for it to dump onto?
Hit me during lunch today that you've likely already proven that the Windows you're running on has the defective pipe implementation: the fact that the dump consumed 1.3TB is actually likely that Windows was putting STDOUT into a file and waiting until it was done to push the data into %SEVENZIP%. If that's the case then no amount of piping will help.
I suggest that you get a Linux server mounted up with a 10TB file system and mount it via SMB. Then run the dump into a file on the shared file system.
Just one of the reasons that I am not fond of Windows.
The only other viable mechanism is to write a script to dump ranges of, say, 1000 revisions. So 0-999, 1000-1999, 2000-2999, etc. recording each in a file and then %SEVENZIP% each one in turn so you never run out of disk space. You can then write the converse script to load them one after another (uncompressing them before load into a temp file, then removing the temp file after use).
Good to learn all is well.