google-hadoop

google-hadoop Questions

"No Filesystem for Scheme: gs" when running spark job locally

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder) When running the job locally on my Mac machine, I am getting th...

apache-spark hadoop google-cloud-storage google-cloud-dataproc google-hadoop

Dihedron asked 5/1, 2015 at 15:41

Solved

Read from BigQuery into Spark in efficient way?

When using BigQuery Connector to read data from BigQuery I found that it copies all data first to Google Cloud Storage. Then reads this data in parallel into Spark, but when reading big table it ta...

apache-spark google-bigquery google-cloud-dataproc google-hadoop

Humber asked 4/1, 2017 at 10:57

Solved

Migrating 50TB data from local Hadoop cluster to Google Cloud Storage

I am trying to migrate existing data (JSON) in my Hadoop cluster to Google Cloud Storage. I have explored GSUtil and it seems that it is the recommended option to move big data sets to GCS. It see...

google-api google-api-java-client google-hadoop

Bysshe asked 13/8, 2014 at 16:25

BigQuery connector for pyspark via Hadoop Input Format example

I have a large dataset stored into a BigQuery table and I would like to load it into a pypark RDD for ETL data processing. I realized that BigQuery supports the Hadoop Input / Output format https...

apache-spark google-bigquery pyspark google-hadoop google-cloud-dataproc

Onstad asked 14/7, 2015 at 8:11

Solved

SparkR collect method crashes with OutOfMemory on Java heap space

With SparkR, I'm trying for a PoC to collect an RDD that I created from text files which contains around 4M lines. My Spark cluster is running in Google Cloud, is bdutil deployed and is composed w...

r apache-spark google-hadoop sparkr

Mcnew asked 4/6, 2015 at 13:45

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

google-hadoop Questions

Recommended topics

Hot tags