spark-submit Questions

2

I am able to run pyspark and run a script on Jupyter notebook. But when I try to run the file from terminal using spark-submit, getting this error: Error executing Jupyter command file path [Errn...
Wallaby asked 30/9, 2017 at 23:16

22

Solved

I'd like to stop various messages that are coming on spark shell. I tried to edit the log4j.properties file in order to stop these message. Here are the contents of log4j.properties # Define the...
Kevin asked 5/1, 2015 at 14:4

2

Solved

I submitted my code to the cluster to run, but I encountered the following error. ''' java.lang.IllegalArgumentException: Too large frame: 5211883372140375593 at org.sparkproject.guava.base.Precond...
Fritzfritze asked 25/9, 2020 at 9:40

1

We have a pyspark based application and we are doing a spark-submit as shown below. Application is working as expected, however we are seeing a weird warning message. Any way to handle this or why ...
Contractor asked 13/7, 2021 at 8:57

3

~/spark/spark-2.1.1-bin-hadoop2.7/bin$ ./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar Runnin...
Skimmer asked 20/6, 2017 at 20:49

7

Solved

True... it has been discussed quite a lot. However, there is a lot of ambiguity and some of the answers provided ... including duplicating JAR references in the jars/executor/driver configuration o...
Mcnutt asked 10/5, 2016 at 8:3

5

Solved

I follow the Scala tutorial on https://spark.apache.org/docs/2.1.0/quick-start.html My scala file /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._...
Nanette asked 8/11, 2017 at 5:23

2

Solved

I've created a Spark cluster with one master and two slaves, each one on a Docker container. I launch it with the command start-all.sh. I can reach the UI from my local machine at localhost:8080 an...
Ishii asked 26/1, 2022 at 8:57

4

Solved

I am fighting it the whole day. I am able to install and to use a package (graphframes) with spark shell or a connected Jupiter notebook, but I would like to move it to the kubernetes based spark e...
Trometer asked 20/3, 2021 at 14:40

1

Solved

I'm trying to submit my Pyspark application to a Kubernetes cluster (Minikube) using spark-submit: ./bin/spark-submit \ --master k8s://https://192.168.64.4:8443 \ --deploy-mode cluster \ --packa...
Lemal asked 24/2, 2021 at 20:7

2

I am running into some problems in (Py)Spark on EMR (release 5.32.0). Approximately a year ago I ran the same program on an EMR cluster (I think the release must have been 5.29.0). Then I was able ...

3

I wrote a spark streaming application built with sbt. It works perfectly fine locally, but after deploying on the cluster, it complains about a class I wrote which clearly in the fat jar (checked u...
Heterocyclic asked 26/4, 2017 at 3:17

1

I have 4 python scripts and one configuration file of .txt . out of 4 python files , one file has entry point for spark application and also importing functions from other python files . But config...
Manouch asked 24/9, 2020 at 9:3

1

I am new to spark, I have start zookeeper, kafka(0.10.1.1) on my local, also spark standalone(2.2.0) with one master and 2 workers. my local scal version is 2.12.3 I was able to run wordcount on s...

2

Solved

I am trying to deploy spark job by using spark-submit which has bunch of parameters like spark-submit --class Eventhub --master yarn --deploy-mode cluster --executor-memory 1024m --executor-cores...
Possess asked 16/3, 2017 at 13:49

1

I have spark running in cluster (Remote) How do I submit application using spark-submit to remote cluster with following scenerio: spark-submit is executed as command via camel the application r...
Wellman asked 28/11, 2019 at 14:9

2

Solved

In Spark 2.0. How do you set the spark.yarn.executor.memoryOverhead when you run spark submit. I know for things like spark.executor.cores you can set --executor-cores 2. Is it the same pattern fo...
Fairly asked 1/8, 2018 at 13:45

2

I'm writing a spark application and run it using spark-submit shell script (using yarn-cluster/yarn-client) As I see now, exit code of spark-submit is decided according to the related yarn applica...
Merrymerryandrew asked 31/1, 2017 at 15:24

3

Can specifying num-executors in spark-submit command override alreay enabled dynamic allocation (spark.dynamicAllocation.enable true) ?
Slype asked 20/1, 2018 at 5:9

1

Solved

I am running below code in spark using Java. Code Test.java package com.sample; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.sql.Datase...
Armentrout asked 22/11, 2018 at 7:37

2

I want to execute spark submit job on AWS EMR cluster based on the file upload event on S3. I am using AWS Lambda function to capture the event but I have no idea how to submit spark submit job on ...

4

I have tried to write a transform method from DataFrame to DataFrame. And I also want to test it by scalatest. As you know, in Spark 2.x with Scala API, you can create SparkSession object as follo...
Christology asked 31/7, 2017 at 4:20

1

I use spark to read from elasticsearch.Like select col from index limit 10; The problem is that the index is very large, it contains 100 billion rows.And spark generate thousands of tasks to fi...
Supermundane asked 30/11, 2017 at 2:50

0

I have a test.py file import pandas as pd import numpy as np import tensorflow as tf from sklearn.externals import joblib import tqdm import time print("Successful import") I have followed this...
Mcelrath asked 16/5, 2018 at 2:9

3

Solved

To submit a Spark application to a cluster, their documentation notes: To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assem...

© 2022 - 2024 — McMap. All rights reserved.