How to read parquet file as R data.frame without any other dependencies (like spark, python etc)?
Asked Answered
D

2

5

I need to read some 'paraquet' files in R. There are few solution using

  1. sparklyr:: spark_read_parquet (which required 'spark')
  2. reticulate (which need python)

Now the problem is I am not allowed to install any tool other than R. Is there any package available in R which can read 'paraquet' without using any other tool?

Dukas answered 14/3, 2019 at 13:34 Comment(0)
U
6

You can use arrow for this (the same thing as in Python pyarrow) but this nowadays also comes packaged for R (without the need for Python). As it is not yet available on CRAN, you have to manually install Arrow C++ first:

git clone https://github.com/apache/arrow.git
cd arrow/cpp && mkdir release && cd release

# It is important to statically link to boost libraries
cmake .. -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install

Then you can install the R arrow package:

devtools::install_github("apache/arrow/r")

And use it to load a Parquet file

library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
#> The following objects are masked from 'package:base':
#> 
#>     array, table
read_parquet("somefile.parquet", as_tibble = TRUE)
#> # A tibble: 10 x 2
#>        x       y
#>    <int>   <dbl>
#> …

Edit (22/9/2019)

It is now available on CRAN, install using install.packages("arrow")

Underclay answered 14/3, 2019 at 14:32 Comment(2)
cant I just use install library(arrow) ? do I have to type the cmd lines?Ageratum
@ℕʘʘḆḽḘ You can now.Mathematical
A
0

Five years later, but perhaps worth noting that you can now read and write (flat) Parquet files with the nanoparquet R package, which is pretty small and easy to install:

install.packages("nanoparquet")
library(nanoparquet)
write_parquet(mtcars, "mtcars.parquet")
read_parquet("mtcars.parquet")

See more at https://r-lib.github.io/nanoparquet/

Are answered 4/6 at 14:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.