Finding Large Files in Mercurial Repository
Asked Answered
D

1

11

Similar to this link but for mercurial. I'd like to find the files that are most contributing to the size of my mercurial repository.

I intend to use hg convert to create a new, smaller repository. I'm just not sure yet which files are contributing to the repository size. They could be files that have already been deleted.

What is a good way to find these anywhere in the repository history? There are over 20,000 commits. I'm thinking a powershell script, but I'm not sure what the best way to go about this is.

Dunne answered 14/12, 2015 at 22:56 Comment(0)
W
12

Check hg help fileset. Something like

hg files "set:size('>1M')"

should do the trick for you. You might need to operate over all revisions, though as it only operates on one revision. In bash I'd try something like

for i in `hg log -r"all()" "set:size('>400k')" --template="{rev}\n"`; do hg files -r$i "set:size('>400k')"; done | sort | uniq

might do the trick. Maybe it can be optimized as it's currently a bit duplication and might run for quite a bit; on the OpenTTD repository with 22000 commits it took on my laptop just short of 10 minutes.

(Also check hg help on templates, files and grep)

Wassail answered 14/12, 2015 at 23:58 Comment(1)
Thanks, that works wonders. I am using windows. For completeness the powershell script is hg log -r"all()" "set:size('>1024k')" --template="{rev}\n" | Foreach { hg files -r $_ "set:size('>1024k')" >> results.txt; get-content results.txt | sort | get-unique > results2.txt; Remove-Item results.txt; Move-Item results2.txt results.txt } and the bat file would be for /F %i in ('hg log -r"all()" "set:size('>1024k')" --template="{rev}\n"') DO hg files -r %i "set:size('>1024k')" >> results.txt (that doesn't sort/filter though)Dunne

© 2022 - 2024 — McMap. All rights reserved.