R: unzipping large compressed .csv yields "zip file is corrupt" warning
Asked Answered
D

2

8

I am downloading a 78MB zip file from the UN FAO, which contains a 2.66GB csv. I am able to unzip the the downloaded file from a folder using winzip, but have been unable to unzip the file using unzip() in R:

Warning - 78MB download!

url <- "http://fenixservices.fao.org/faostat/static/bulkdownloads/FoodBalanceSheets_E_All_Data_(Normalized).zip"
path <- file.path(getwd(),"/zipped_data.zip")
download.file(url, path, mode = "wb")
unzipped_data <- unzip(path)

This results in a warning and a failure to unzip the file:

Warning message

In unzip(path) : zip file is corrupt

In the ?unzip documentation I see

"It does have some support for bzip2 compression and > 2GB zip files (but not >= 4GB files pre-compression contained in a zip file: like many builds of unzip it may truncate these, in R's case with a warning if possible)"

This makes me believe that unzip() should handle my file, but this same process has successfully downloaded, unzipped, and read multiple other smaller tables from the FAOstat. Is there a chance that the size of my csv is the source of this error? If so, what is the workaround?

Dockery answered 1/8, 2017 at 20:11 Comment(3)
you could build a shell command in R calling winzip on your file and execute it with the shell function from R. if you wrap it into a function unzip2 there won't be much differenceCommunication
I've seen this idea on other posts suggesting system but I haven't seen an example. Could you provide one, or point me in the right direction? Thanks for the help!Dockery
See my answer, please give me some feedback so I can improve it as it's untested and contains some uncertainties.Communication
A
3

I had the same problem running unzip() on Ubuntu Server 20.04. Setting argument unzip(..., unzip = "/usr/bin/unzip"), instead of unzip = "internal", did the trick.

Annulus answered 22/4, 2022 at 21:5 Comment(0)
C
1

I can't test my solution and it also depends on your installation but hopefully that'll work or at least point you to a suitable solution:

You can run winzip through command line, this page shows the structure of the call

And you can also run command lines from R, with system or shell (which is just a wrapper for system

The command line general structure to extract would be:

winzip32 -e [options] filename[.zip] folder

So we create a string with this structure and your input paths, and we create a function around it that mimics unzip with parameters zipfile and exdir

unzip_wz <- function(zipfile,exdir){
  dir.create(exdir,recursive = FALSE,showWarnings=FALSE) # I don't know how/if unzip creates folders, you might want to tweak or remove this line altogether
  str1 <- sprintf("winzip32 -e '%s' '%s'",zipfile,exdir)
  shell(str1,wait = TRUE)  # set to FALSE if you want the program to keep running while unzipping, proceed with caution but in some cases that could be an improvement of your current solution
}

You can try to use this function in place of unzip. It assumes that winzip32 was added to your system path variables, if it isn't, either add it, or replace it by the exec full name so you have something like:

str1 <- sprintf("'C://probably/somewhere/in/program/files/winzip32.exe' -e '%s' '%s'",zipfile,exdir)

PS: use full paths! the command line doesn't know your working directories (we could implement the feature in our function if needed).

Communication answered 2/8, 2017 at 14:0 Comment(4)
Thanks for pointing me towards winzip cmd syntax. I haven't been successful even unzipping my file from command line yet though. Ultimately, using a system command may not be a solution for me - I'm trying to write a script that rebuilds tables that are hosted on a network drive. It needs to run from any users machine with an R installation, so pointing to a local installation of winzip might be problematic, though likely the path is the same on most of our organization's machines.Dockery
Maybe it was a bit too obvious, but this may do the trick! cran.r-project.org/web/packages/zip/index.htmlCommunication
Or this: rdocumentation.org/packages/memisc/versions/0.97/topics/UnZipCommunication
Giving zip a try (linux) immediately after warning: zip file is corrupt, shows this * DONE (zip) The downloaded source packages are in ‘/tmp/RtmpDgwPmW/downloaded_packages’ Warning message: closing unused connection 3 (./:617d4dbbd.flac), connection listing is first desired file, of large zip, so perhaps corruption has to do with not cleaning up...This in the case where files = some_character_vec_of_the_files_desired is used. installing zip really solved this not too obvious problem.Rickirickie

© 2022 - 2024 — McMap. All rights reserved.