How much space is needed to download the entire CRAN Repository? Keeping all the files zipped, how large would a folder holding all the packages be? I can't find a clear answer to this question. I've read about 3GB, but I've also come across 200GB.
Per my comment:
rsync -rtlzv --delete cran.r-project.org::CRAN/bin/macosx/mavericks/contrib/3.2/ /cran/bin/macosx/mavericks/contrib/3.2/
rsync -rtlzv --delete cran.r-project.org::CRAN/bin/macosx/mavericks/contrib/3.3/ /cran/bin/macosx/mavericks/contrib/3.3/
rsync -rtlzv --delete cran.r-project.org::CRAN/doc/ /cran/doc/
rsync -rtlzv --delete cran.r-project.org::CRAN/bin/macosx/tools/ /cran/bin/macosx/tools/
rsync -rtlzv --delete cran.r-project.org::CRAN/web/ /cran/web/
rsync -rtlzv --delete cran.r-project.org::CRAN/src/ /cran/src/
rsync -tlzv --delete -a --include="NEWS" --include="*.shtml" --include="*.html" --include="*.pkg" --include="*.dmg" --include="*.gz" --exclude="*" cran.r-project.org::CRAN/bin/macosx/ /cran/bin/macosx/
rsync -tlzv --delete -a --include="*.html" --include="*.shtml" --include="*.svg" --include="*.png" --exclude="*" cran.r-project.org::CRAN/ /cran/
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz /cran/src/contrib/PACKAGES.gz
(which is not an optimized set of rsync
statements) gets me a very fully functional local CRAN repo that supports all of my systems quite well. I let the sole, nigh useless Windows VM I keep for testing use RStudio's mirror since I have no use for it's cruft on this system, but my linux and macOS systems work flawlessly with this when it comes to pkgs.
As I said in the comment, this is under 60GB.
To make it fully functional, you have to setup a web server and it's a PITA to use anything else but Apache given the 1990's web tech setup CRAN seems determined to maintain. Said config is an exercise left to the reader.
Of note: it's worth the time doing the mirror and exploring the nuggets around the filesystem. Many RDS files for "accounting" and other insights you won't get from starting at the 1990's HTML files on the web site.
Using your own, local mirror reduces the information leakage and stops you from contributing to the (IMO very inaccurate) "# downloads" package counts that show up on GitHub README.md badges and keeps your privacy for those mirrors that don't adhere to not keeping logs or mining your pkg usage.
© 2022 - 2024 — McMap. All rights reserved.
rsync
configuration (daily) and it's now <60GB on disk for the subset I've chosen to mirror which is pkg sources, macOS binaries, full R sources, all HTML (including CRAN checks) and some other bits. – Pharisaism