How to programmatically extract / unzip a .7z (7-zip) file with R
Asked Answered
P

2

39

I'm trying to automate the extraction of a number of files compressed with 7-zip. I need to automate this process, because a) there are many years of data I'd like to unlock and b) I'd like to share my code with others and prevent them from repeating the process by hand.

I have both WinRAR and 7-zip installed on my computer, and I can individually open these files easily with either program.

I've looked around at the unzip untar and unz commands, but I don't believe any of them do what I need.

I don't know anything about compression, but if it makes any difference: each of these files only contains one file and it's just a text file.

I would strongly prefer a solution that does not require the user to install additional software (like WinRAR or 7-Zip) and execute a command with shell, although I acknowledge this task might be impossible with just R and CRAN packages. I actually believe running shell.exec on these files with additional parameters might work on computers with WinRAR installed, but again, I'd like to avoid that installation if possible. :)

Running the code below will load the files I am trying to extract -- the .7z files in files.data are what needs to be unlocked.

# create a temporary file and temporary directory, download the file, extract the file to the temporary directory
tf <- tempfile() ; td <- tempdir()
file.path <- "ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2008_2009/Microdados/Dados.zip"
download.file( file.path , tf , mode = "wb" )
files.data <- unzip( tf , exdir = td )

# how do i unzip ANY of these .7z files?
files.data

Thanks!!! :)

Pelagianism answered 19/4, 2013 at 2:22 Comment(3)
The best solution would be a package that could read and write 7z files using either the standard connection API or via temporary files on disk. But I don't think that package exists.Nastassia
agreed. now i'm petitioning the folks at the brazilian census to follow @dirk's advice and re-post the files with a standard format :) thanks hadley!Pelagianism
the example your showing is a pkzip compresse file not a p7 compressed file. So your standard unzip() would work. A related question would be https://mcmap.net/q/409791/-sys-glob-within-unzipQuarterphase
M
26

If you have 7z executable in your path, you can simple use system command

system('7z e -o <output_dir> <archive_name>')

Municipalize answered 19/4, 2013 at 6:39 Comment(4)
Completely misses the requirement of "I would strongly prefer a solution that does not require the user to install additional software".Evulsion
@DirkEddelbuettel but short of doing everything by hand, it's the only thing that works, right? :(Pelagianism
@DirkEddelbuettel ..from what you and hadley are saying, it's the only answer. :( why would i delete the thread? others might also benefit from knowing this task is impossible without installing external softwarePelagianism
note that the command is system('7z e -o<output_dir> <archive_name>'). With a space between -o and the directory it fails!Labrador
T
33

This can be done with the archive package.

library(archive)
tf <- tempfile() ; td <- tempdir()
file.path <- "ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2008_2009/Microdados/Dados.zip"
download.file( file.path , tf , mode = "wb" )
archive(tf)

See https://github.com/jimhester/archive

Torchbearer answered 15/5, 2017 at 20:58 Comment(4)
amazing. i'll start dumping 7-zip dependencies right away. github.com/ajdamico/lodown/issues/99 thank youPelagianism
archive_extract for extraction.Citrus
Thank you! For compression archive_write_dir() works effectively.Unmindful
Any way to deal with passwords? (both for compress and decompress)Trenatrenail
M
26

If you have 7z executable in your path, you can simple use system command

system('7z e -o <output_dir> <archive_name>')

Municipalize answered 19/4, 2013 at 6:39 Comment(4)
Completely misses the requirement of "I would strongly prefer a solution that does not require the user to install additional software".Evulsion
@DirkEddelbuettel but short of doing everything by hand, it's the only thing that works, right? :(Pelagianism
@DirkEddelbuettel ..from what you and hadley are saying, it's the only answer. :( why would i delete the thread? others might also benefit from knowing this task is impossible without installing external softwarePelagianism
note that the command is system('7z e -o<output_dir> <archive_name>'). With a space between -o and the directory it fails!Labrador

© 2022 - 2024 — McMap. All rights reserved.