Spark shell : How to copy multiline inside?

Q

4

11

I have a Scala program that I want to execute using Spark shell, now when I copy paste into spark shell it doesn't work, I have to copy line by line inside.

How should I copy all the program inside the shell ?

Thanks.

Quintero answered 19/9, 2019 at 10:19 Comment(0)

M

5

just save your code to text file and use :load <path_to_your_script> in spark-shell

Munich answered 19/9, 2019 at 10:59 Comment(1)

I found that I needed to put my code in parenthesis like Craig said https://mcmap.net/q/955555/-spark-shell-how-to-copy-multiline-inside even it's in a file. – Habakkuk 13/1, 2021 at 10:9

I

29

In spark-shell, you just need use the command ":paste"

scala> :paste
// Entering paste mode (ctrl-D to finish)

val empsalary = Seq(
  Salary("sales", 1, 5000),
  Salary("personnel", 2, 3900),
  Salary("sales", 3, 4800),
  Salary("sales", 4, 4800),
  Salary("personnel", 5, 3500),
  Salary("develop", 7, 4200),
  Salary("develop", 8, 6000),
  Salary("develop", 9, 4500),
  Salary("develop", 10, 5200),
  Salary("develop", 11, 5200))
.toDS.toDF

Then use ctrl-D to quit this mode. You can see output:

// Exiting paste mode, now interpreting.

empsalary: org.apache.spark.sql.DataFrame = [depName: string, empNo: bigint ... 1 more field]

Inchoate answered 19/9, 2019 at 17:40 Comment(2)

This saves the day. Thanks! – Stonework 29/12, 2020 at 4:23

this should be the accepted answer as it's simpler and it works. – Thurmond 4/6, 2021 at 21:49

I

8

In the Spark shell you can wrap your multiple line Spark code in parenthesis to execute the code. Wrapping in parenthesis will allow you to copy multiple line Spark code into the shell or write multiple line code line-by-line. See the examples below for usage.

scala> val adult_cat_df = (spark.read.format("csv")
 |   .option("sep", ",")
 |   .option("inferSchema", "true")
 |   .option("header", "false")
 |   .load("hdfs://…/adult/adult_data.csv")
 |   .toDF("age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "class")
 |   .drop("fnlwgt", "education-num", "capital-gain", "capital-loss")
 | )
scala> val clean_df = (adult_cat_df.dropDuplicates
 |   .na.replace("*", Map("?" -> null))
 |   .na.drop(minNonNulls = 9)
 | )

Indigestive answered 21/12, 2019 at 21:43 Comment(0)

D

6

I would need more explanation from you. But I guess you are trying to do something like that :

spark.read.parquet(X)
.filter("ll")
.groupBy("iii")
.agg("kkk")

And it does not work. Instead you can do :

spark.read.parquet(X).
    filter("ll").
    groupBy("iii").
    agg("kkk")

Put the dot at the end of the line.

I hope it is what you are looking for.

Dreadfully answered 19/9, 2019 at 13:23 Comment(0)

M

5

just save your code to text file and use :load <path_to_your_script> in spark-shell

Munich answered 19/9, 2019 at 10:59 Comment(1)

I found that I needed to put my code in parenthesis like Craig said https://mcmap.net/q/955555/-spark-shell-how-to-copy-multiline-inside even it's in a file. – Habakkuk 13/1, 2021 at 10:9

Recommended topics

Hot tags