I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside.
How should I copy all the program inside the shell ?
Thanks.
I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside.
How should I copy all the program inside the shell ?
Thanks.
just save your code to text file and use :load <path_to_your_script>
in spark-shell
In spark-shell, you just need use the command ":paste"
scala> :paste
// Entering paste mode (ctrl-D to finish)
val empsalary = Seq(
Salary("sales", 1, 5000),
Salary("personnel", 2, 3900),
Salary("sales", 3, 4800),
Salary("sales", 4, 4800),
Salary("personnel", 5, 3500),
Salary("develop", 7, 4200),
Salary("develop", 8, 6000),
Salary("develop", 9, 4500),
Salary("develop", 10, 5200),
Salary("develop", 11, 5200))
.toDS.toDF
Then use ctrl-D to quit this mode. You can see output:
// Exiting paste mode, now interpreting.
empsalary: org.apache.spark.sql.DataFrame = [depName: string, empNo: bigint ... 1 more field]
In the Spark shell you can wrap your multiple line Spark code in parenthesis to execute the code. Wrapping in parenthesis will allow you to copy multiple line Spark code into the shell or write multiple line code line-by-line. See the examples below for usage.
scala> val adult_cat_df = (spark.read.format("csv")
| .option("sep", ",")
| .option("inferSchema", "true")
| .option("header", "false")
| .load("hdfs://…/adult/adult_data.csv")
| .toDF("age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "class")
| .drop("fnlwgt", "education-num", "capital-gain", "capital-loss")
| )
scala> val clean_df = (adult_cat_df.dropDuplicates
| .na.replace("*", Map("?" -> null))
| .na.drop(minNonNulls = 9)
| )
I would need more explanation from you. But I guess you are trying to do something like that :
spark.read.parquet(X)
.filter("ll")
.groupBy("iii")
.agg("kkk")
And it does not work. Instead you can do :
spark.read.parquet(X).
filter("ll").
groupBy("iii").
agg("kkk")
Put the dot at the end of the line.
I hope it is what you are looking for.
just save your code to text file and use :load <path_to_your_script>
in spark-shell
© 2022 - 2024 — McMap. All rights reserved.