I need to read some 'paraquet' files in R. There are few solution using
Now the problem is I am not allowed to install any tool other than R. Is there any package available in R which can read 'paraquet' without using any other tool?
I need to read some 'paraquet' files in R. There are few solution using
Now the problem is I am not allowed to install any tool other than R. Is there any package available in R which can read 'paraquet' without using any other tool?
You can use arrow
for this (the same thing as in Python pyarrow
) but this nowadays also comes packaged for R (without the need for Python). As it is not yet available on CRAN, you have to manually install Arrow C++ first:
git clone https://github.com/apache/arrow.git
cd arrow/cpp && mkdir release && cd release
# It is important to statically link to boost libraries
cmake .. -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
Then you can install the R arrow
package:
devtools::install_github("apache/arrow/r")
And use it to load a Parquet file
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
#> The following objects are masked from 'package:base':
#>
#> array, table
read_parquet("somefile.parquet", as_tibble = TRUE)
#> # A tibble: 10 x 2
#> x y
#> <int> <dbl>
#> …
It is now available on CRAN, install using install.packages("arrow")
Five years later, but perhaps worth noting that you can now read and write (flat) Parquet files with the nanoparquet R package, which is pretty small and easy to install:
install.packages("nanoparquet")
library(nanoparquet)
write_parquet(mtcars, "mtcars.parquet")
read_parquet("mtcars.parquet")
See more at https://r-lib.github.io/nanoparquet/
© 2022 - 2024 — McMap. All rights reserved.
install library(arrow)
? do I have to type the cmd lines? – Ageratum