google-hadoop Questions

4

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder) When running the job locally on my Mac machine, I am getting th...

3

Solved

When using BigQuery Connector to read data from BigQuery I found that it copies all data first to Google Cloud Storage. Then reads this data in parallel into Spark, but when reading big table it ta...

2

Solved

I am trying to migrate existing data (JSON) in my Hadoop cluster to Google Cloud Storage. I have explored GSUtil and it seems that it is the recommended option to move big data sets to GCS. It see...
Bysshe asked 13/8, 2014 at 16:25

1

I have a large dataset stored into a BigQuery table and I would like to load it into a pypark RDD for ETL data processing. I realized that BigQuery supports the Hadoop Input / Output format https...

1

Solved

With SparkR, I'm trying for a PoC to collect an RDD that I created from text files which contains around 4M lines. My Spark cluster is running in Google Cloud, is bdutil deployed and is composed w...
Mcnew asked 4/6, 2015 at 13:45
1

© 2022 - 2024 — McMap. All rights reserved.