google-cloud-dataproc Questions

0

We have a dataproc cluster staging bucket wherein all the spark job logs are getting stored. eu-digi-pipe-dataproc-stage/google-cloud-dataproc-metainfo/d0decf20-21fd-4536-bbc4-5a4f829e49bf/jobs/goo...

4

Solved

What port should I use to access the Spark UI on Google Dataproc? I tried port 4040 and 7077 as well as a bunch of other ports I found using netstat -pln Firewall is properly configured.
Amok asked 18/10, 2015 at 0:35

4

We noticed our jobs are failing with the below error on the dataproc cluster. ERROR: gcloud crashed (AttributeError): 'bool' object has no attribute 'lower' If you would like to report this ...
Ignescent asked 2/5, 2023 at 23:6

3

Solved

As given in the below blog, https://cloud.google.com/blog/big-data/2016/06/google-cloud-dataproc-the-fast-easy-and-safe-way-to-try-spark-20-preview I was trying to read file from Google Cloud Sto...

7

I am using Google Data Flow to implement an ETL data ware house solution. Looking into google cloud offering, it seems DataProc can also do the same thing. It also seems DataProc is little bit ...

0

In my Java application I have an implementation for a file-system layer, where my file class is a wrapper for Hadoop filesystem methods. I am upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 and ...

4

Solved

Currently, google dataproc does not have spark 3.2.0 as an image. The latest available is 3.1.2. I want to use the pandas on pyspark functionality that spark has released with 3.2.0. I am doing the...
Tyratyrannical asked 7/12, 2021 at 2:51

5

I have a (non-admin) account on one GCP project. When I start the Dataproc cluster, GCP spins up 3 VMs. When I try to access one of the VM via SSH (in browser) I get the following error: I tri...

2

I am loading a dataset from BigQuery and after some transformations, I'd like to save the transformed DataFrame back into BigQuery. Is there a way of doing this? This is how I am loading the data:...

4

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder) When running the job locally on my Mac machine, I am getting th...

2

Solved

How do you pass parameters into the python script being called in a dataproc pyspark job submit? Here is a cmd I've been mucking with: gcloud dataproc jobs submit pyspark --cluster my-dataproc \ ...
Breakout asked 28/11, 2017 at 20:31

7

Solved

I have recently performed a migration to Google Cloud Platform, and I really like it. However I can't find a way to monitor the memory usage of the Dataproc VM intances. As you can see on the attac...

3

Solved

I'm trying to load data from Google BigQuery into Spark running on Google Dataproc (I'm using Java). I tried to follow instructions on here: https://cloud.google.com/dataproc/docs/tutorials/bigquer...

3

I have a python project, whose folder has the structure main_directory - lib - lib.py - run - script.py script.py is from lib.lib import add_two spark = SparkSession \ .builder \ .master('y...
Hobbes asked 23/4, 2020 at 11:45

3

I have the following folder structure - libfolder - lib1.py - lib2.py - main.py main.py calls libfolder.lib1.py which then calls libfolder.lib2.py and others. It all works perfectly fine i...
Dispirit asked 20/12, 2018 at 6:38

2

Solved

In GCP it is fairly simply to install and run a JupyterHub component from the UI or the gcloud command. I'm trying to script the processus through Airflow and the DataprocClusterCreateOperator, her...
Breastwork asked 2/1, 2020 at 18:11

3

Solved

I am trying to run a Spark job on a google dataproc cluster, but get the following error: Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop...
Overcoat asked 21/12, 2017 at 16:43

1

Solved

I am using spark-job on a self-managed cluster (like local environment) while accessing buckets on google storage. ❯ spark-submit --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _...
Parrott asked 14/9, 2021 at 6:42

2

Solved

I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn pro...
Bernetta asked 3/4, 2017 at 20:25

1

Solved

I'm trying to monitor local disk usage (percentage) on Dataproc 2.0 using cloud metrics. This would be useful for monitoring situations where Spark temporary files fill up disk. By default Dataproc...
Philps asked 16/7, 2021 at 3:47

3

Solved

When using BigQuery Connector to read data from BigQuery I found that it copies all data first to Google Cloud Storage. Then reads this data in parallel into Spark, but when reading big table it ta...

4

Solved

I want to run a pyspark job through Google Cloud Platform dataproc, but I can't figure out how to setup pyspark to run python3 instead of 2.7 by default. The best I've been able to find is adding ...

1

Dataproc cluster is create with image 2.0.x with delta io package io.delta:delta-core_2.12:0.7.0 Spark version is 3.1.1 Spark shell initiated with : pyspark --conf "spark.sql.extensions=io.del...

2

Solved

I set data proc using the steps in link here https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook But my jyputer keep asking for password I didn't set any password. I tried my go...

4

The Dataproc clusters I created always show status as "running" on web portal. Is there a way to stop/deprovision a cluster when it is not in use so that it does not burn resources and $$...
Primipara asked 2/2, 2018 at 2:5

© 2022 - 2025 — McMap. All rights reserved.