Spark shell : How to copy multiline inside?
Asked Answered
Q

4

11

I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside.

How should I copy all the program inside the shell ?

Thanks.

Quintero answered 19/9, 2019 at 10:19 Comment(0)
M
5

just save your code to text file and use :load <path_to_your_script> in spark-shell

Munich answered 19/9, 2019 at 10:59 Comment(1)
I found that I needed to put my code in parenthesis like Craig said https://mcmap.net/q/955555/-spark-shell-how-to-copy-multiline-inside even it's in a file.Habakkuk
I
29

In spark-shell, you just need use the command ":paste"

scala> :paste
// Entering paste mode (ctrl-D to finish)

val empsalary = Seq(
  Salary("sales", 1, 5000),
  Salary("personnel", 2, 3900),
  Salary("sales", 3, 4800),
  Salary("sales", 4, 4800),
  Salary("personnel", 5, 3500),
  Salary("develop", 7, 4200),
  Salary("develop", 8, 6000),
  Salary("develop", 9, 4500),
  Salary("develop", 10, 5200),
  Salary("develop", 11, 5200))
.toDS.toDF

Then use ctrl-D to quit this mode. You can see output:

// Exiting paste mode, now interpreting.

empsalary: org.apache.spark.sql.DataFrame = [depName: string, empNo: bigint ... 1 more field]
Inchoate answered 19/9, 2019 at 17:40 Comment(2)
This saves the day. Thanks!Stonework
this should be the accepted answer as it's simpler and it works.Thurmond
I
8

In the Spark shell you can wrap your multiple line Spark code in parenthesis to execute the code. Wrapping in parenthesis will allow you to copy multiple line Spark code into the shell or write multiple line code line-by-line. See the examples below for usage.

scala> val adult_cat_df = (spark.read.format("csv")
 |   .option("sep", ",")
 |   .option("inferSchema", "true")
 |   .option("header", "false")
 |   .load("hdfs://…/adult/adult_data.csv")
 |   .toDF("age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "class")
 |   .drop("fnlwgt", "education-num", "capital-gain", "capital-loss")
 | )
scala> val clean_df = (adult_cat_df.dropDuplicates
 |   .na.replace("*", Map("?" -> null))
 |   .na.drop(minNonNulls = 9)
 | )
Indigestive answered 21/12, 2019 at 21:43 Comment(0)
D
6

I would need more explanation from you. But I guess you are trying to do something like that :

spark.read.parquet(X)
.filter("ll")
.groupBy("iii")
.agg("kkk")

And it does not work. Instead you can do :

spark.read.parquet(X).
    filter("ll").
    groupBy("iii").
    agg("kkk")

Put the dot at the end of the line.

I hope it is what you are looking for.

Dreadfully answered 19/9, 2019 at 13:23 Comment(0)
M
5

just save your code to text file and use :load <path_to_your_script> in spark-shell

Munich answered 19/9, 2019 at 10:59 Comment(1)
I found that I needed to put my code in parenthesis like Craig said https://mcmap.net/q/955555/-spark-shell-how-to-copy-multiline-inside even it's in a file.Habakkuk

© 2022 - 2024 — McMap. All rights reserved.