I have been using the latest R
arrow
package (arrow_2.0.0.20201106
) that supports reading and writing from AWS S3 directly (which is awesome).
I don't seem to have issues when I write and read my own file (see below):
write_parquet(iris, "iris.parquet")
system("aws s3 mv iris.parquet s3://myawsbucket/iris.parquet")
df <- read_parquet("s3://myawsbucket/iris.parquet")
But when I try to read in one of the sample R
arrow
files, I get the following error:
df <- read_parquet("s3://ursa-labs-taxi-data/2019/06/data.parquet")
Error in parquet___arrow___FileReader__ReadTable1(self) :
IOError: NotImplemented: Support for codec 'snappy' not built
When I check if the codec is available, it looks like it is not:
codec_is_available(type="snappy")
[1] FALSE
Anyone know a way to make the "snappy" codec available?
Thanks, Mike
###########
Follow up
Thanks to the answer from @Neal below. Here is the code that installed all needed dependencies for me.
Sys.setenv(ARROW_S3="ON")
Sys.setenv(NOT_CRAN="true")
install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")
Sys.setenv(ARROW_S3 = "ON")
and then runninginstall.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")
. I have installed it this way on Ubuntu 18 and Ubuntu 20. The cran version works great on my local Mac machine. I use the nightly build in order to take advanage of the Cmake versioning issue I was having on Linux. – Chilli