Decompress gz file using R
Asked Answered
H

6

80

I have used ?unzip in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here.

I have tried ?gzfile and ?gzcon but have not been able to get it to work. Any help you can provide will be greatly appreciated.

Hydrophane answered 23/4, 2011 at 13:45 Comment(0)
L
46

If you really want to uncompress the file, just use the untar function which does support gzip. E.g.:

untar('chadwick-0.5.3.tar.gz')
Laplante answered 24/4, 2011 at 11:57 Comment(0)
S
79

Here is a worked example that may help illustrate what gzfile() and gzcon() are for

foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
foo
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
write.table(foo, file="/tmp/foo.csv")
system("gzip /tmp/foo.csv")             # being very explicit

Now that the file is written, instead of implicit use of file(), use gzfile():

read.table(gzfile("/tmp/foo.csv.gz"))   
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776

The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives. These are commonly used to distribute source code--as for example for R packages and R sources.

Spode answered 23/4, 2011 at 13:51 Comment(6)
is it possible to do this with fread{data.table} ? I have tried without success so farImmortalize
That is obvious yet not useful as data.table cannot consume that stream.Spode
That's useful. Its standard heuristic tries reading in three different locations which requires forward/backward positioning you cannot do on a stream. I looked for freak support once and think I saw 'nope' but maybe that has changed.Spode
Current version of data.table in fact supports csv.gz natively (no need for zcat)Spadefish
gzfile also worked for me on txt.gz where untar failed (I didn't finagle with untar too much)Spadefish
This should work now (+9 years after)library(data.table) library(R.utils) d <- fread("file.gz",stringsAsFactors = F)Rea
N
68

To un-gz a file in R you can do

library(R.utils)
gunzip("file.gz", remove=FALSE)

or

gunzip("file.gz")

But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.

Novelty answered 9/5, 2015 at 19:25 Comment(3)
Thats what i was looking for. Be aware: NOTE: The default (remove=TRUE) behavior is that the input file is removed after that the output file is fully created and closed. - see ?gunzipBranscum
gunzip() is now deprecatedUnstop
@Unstop where do you see that? Can't find anywhere saying that gunzip is deprecated.Mellisa
L
46

If you really want to uncompress the file, just use the untar function which does support gzip. E.g.:

untar('chadwick-0.5.3.tar.gz')
Laplante answered 24/4, 2011 at 11:57 Comment(0)
M
29

http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

R added transparent decompression for certain kinds of compressed files in the latest version (2.10). If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files. You should have the proper filename extensions.

The command...

myData <- read.table('myFile.gz')  

#gzip compressed files have a "gz" extension

Will work just as if 'myFile.gz' were the raw text file.

Multifold answered 24/9, 2013 at 17:36 Comment(1)
It does work unless you specify colClasses argument. If you add myData <- read.table('myFile.gz', colClasses=c("character", "integer")) then you will get an error (as of R 3.2.0). Crap.Enchase
F
4

If it's a comma/tab-separated file, you can use data.table's fread(). It can handle zipped (.zip, .gz) files:

fread('myFile.csv.gz')
Farquhar answered 11/2, 2022 at 15:14 Comment(0)
H
3
library(vroom)
columns3 = c('A', 'B',...) ## define column names
Data1<- vroom(".../XXX.tsv",col_names = columns3)

works fine with tsv.gz

Ham answered 5/8, 2020 at 1:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.