google-cloud-dataproc Questions
0
We have a dataproc cluster staging bucket wherein all the spark job logs are getting stored.
eu-digi-pipe-dataproc-stage/google-cloud-dataproc-metainfo/d0decf20-21fd-4536-bbc4-5a4f829e49bf/jobs/goo...
Garbage asked 31/5, 2024 at 4:39
4
Solved
What port should I use to access the Spark UI on Google Dataproc?
I tried port 4040 and 7077 as well as a bunch of other ports I found using netstat -pln
Firewall is properly configured.
Amok asked 18/10, 2015 at 0:35
4
We noticed our jobs are failing with the below error on the dataproc cluster.
ERROR: gcloud crashed (AttributeError): 'bool' object has no attribute 'lower'
If you would like to report this ...
Ignescent asked 2/5, 2023 at 23:6
3
Solved
As given in the below blog,
https://cloud.google.com/blog/big-data/2016/06/google-cloud-dataproc-the-fast-easy-and-safe-way-to-try-spark-20-preview
I was trying to read file from Google Cloud Sto...
Aneurysm asked 1/3, 2017 at 14:50
7
I am using Google Data Flow to implement an ETL data ware house solution.
Looking into google cloud offering, it seems DataProc can also do the same thing.
It also seems DataProc is little bit ...
Hildegaard asked 26/9, 2017 at 22:36
0
In my Java application I have an implementation for a file-system layer, where my file class is a wrapper for Hadoop filesystem methods. I am upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 and ...
Becka asked 18/10, 2022 at 14:16
4
Solved
Currently, google dataproc does not have spark 3.2.0 as an image. The latest available is 3.1.2. I want to use the pandas on pyspark functionality that spark has released with 3.2.0.
I am doing the...
Tyratyrannical asked 7/12, 2021 at 2:51
5
I have a (non-admin) account on one GCP project.
When I start the Dataproc cluster, GCP spins up 3 VMs. When I try to access one of the VM via SSH (in browser) I get the following error:
I tri...
Hypermeter asked 20/3, 2018 at 12:40
2
I am loading a dataset from BigQuery and after some transformations, I'd like to save the transformed DataFrame back into BigQuery. Is there a way of doing this?
This is how I am loading the data:...
Allpowerful asked 30/8, 2019 at 15:28
4
I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder)
When running the job locally on my Mac machine, I am getting th...
Dihedron asked 5/1, 2015 at 15:41
2
Solved
How do you pass parameters into the python script being called in a dataproc pyspark job submit? Here is a cmd I've been mucking with:
gcloud dataproc jobs submit pyspark --cluster my-dataproc \ ...
Breakout asked 28/11, 2017 at 20:31
7
Solved
I have recently performed a migration to Google Cloud Platform, and I really like it.
However I can't find a way to monitor the memory usage of the Dataproc VM intances. As you can see on the attac...
Expedition asked 16/5, 2017 at 1:40
3
Solved
I'm trying to load data from Google BigQuery into Spark running on Google Dataproc (I'm using Java). I tried to follow instructions on here: https://cloud.google.com/dataproc/docs/tutorials/bigquer...
Kitchener asked 3/11, 2019 at 6:51
3
I have a python project, whose folder has the structure
main_directory - lib - lib.py
- run - script.py
script.py is
from lib.lib import add_two
spark = SparkSession \
.builder \
.master('y...
Hobbes asked 23/4, 2020 at 11:45
3
I have the following folder structure
- libfolder
- lib1.py
- lib2.py
- main.py
main.py calls libfolder.lib1.py which then calls libfolder.lib2.py and others.
It all works perfectly fine i...
Dispirit asked 20/12, 2018 at 6:38
2
Solved
In GCP it is fairly simply to install and run a JupyterHub component from the UI or the gcloud command. I'm trying to script the processus through Airflow and the DataprocClusterCreateOperator, her...
Breastwork asked 2/1, 2020 at 18:11
3
Solved
I am trying to run a Spark job on a google dataproc cluster, but get the following error:
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop...
Overcoat asked 21/12, 2017 at 16:43
1
Solved
I am using spark-job on a self-managed cluster (like local environment) while accessing buckets on google storage.
❯ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _...
Parrott asked 14/9, 2021 at 6:42
2
Solved
I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn pro...
Bernetta asked 3/4, 2017 at 20:25
1
Solved
I'm trying to monitor local disk usage (percentage) on Dataproc 2.0 using cloud metrics. This would be useful for monitoring situations where Spark temporary files fill up disk.
By default Dataproc...
Philps asked 16/7, 2021 at 3:47
3
Solved
When using BigQuery Connector to read data from BigQuery I found that it copies all data first to Google Cloud Storage. Then reads this data in parallel into Spark, but when reading big table it ta...
Humber asked 4/1, 2017 at 10:57
4
Solved
I want to run a pyspark job through Google Cloud Platform dataproc, but I can't figure out how to setup pyspark to run python3 instead of 2.7 by default.
The best I've been able to find is adding ...
Saurian asked 23/8, 2017 at 15:33
1
Dataproc cluster is create with image 2.0.x with delta io package io.delta:delta-core_2.12:0.7.0
Spark version is 3.1.1
Spark shell initiated with :
pyspark --conf "spark.sql.extensions=io.del...
Wolfgang asked 8/2, 2021 at 17:7
2
Solved
I set data proc using the steps in link here
https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook
But my jyputer keep asking for password
I didn't set any password.
I tried my go...
Brooklime asked 13/12, 2016 at 9:29
4
The Dataproc clusters I created always show status as "running" on web portal. Is there a way to stop/deprovision a cluster when it is not in use so that it does not burn resources and $$...
Primipara asked 2/2, 2018 at 2:5
1 Next >
© 2022 - 2025 — McMap. All rights reserved.