apache-spark-sql Questions

6

Solved

As mentioned in many other locations on the web, adding a new column to an existing DataFrame is not straightforward. Unfortunately it is important to have this functionality (even though it is ine...
Potluck asked 9/10, 2015 at 12:45

4

Solved

I know how to read a CSV file into Apache Spark using spark-csv, but I already have the CSV file represented as a string and would like to convert this string directly to dataframe. Is this possibl...
Foliage asked 23/8, 2016 at 22:53

7

Solved

How to get current_date - 1 day in sparksql, same as cur_date()-1 in mysql.
Haase asked 13/12, 2016 at 6:28

6

I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception org.apache.spark.SparkException: Job aborted ...
Revolt asked 16/3, 2016 at 11:52

7

Solved

I am interested in being able to retrieve the location value of a Hive table given a Spark object (SparkSession). One way to obtain this value is by parsing the output of the location via the follo...
Typebar asked 6/1, 2019 at 10:27

15

Solved

With pyspark dataframe, how do you do the equivalent of Pandas df['col'].unique(). I want to list out all the unique values in a pyspark dataframe column. Not the SQL type way (registertemplate the...
Alphorn asked 8/9, 2016 at 6:3

5

Does anyone how to do pagination in spark sql query? I need to use spark sql but don't know how to do pagination. Tried: select * from person limit 10, 10
Wilmer asked 24/3, 2015 at 8:29

2

Solved

I have a multi-column pyspark dataframe, and I need to convert the string types to the correct types, for example: I'm doing like this currently df = df.withColumn(col_name, col(col_name).cast('flo...
Coronograph asked 9/7, 2021 at 21:1

3

Checkpoint version: val savePath = "/some/path" spark.sparkContext.setCheckpointDir(savePath) df.checkpoint() Write to disk version: df.write.parquet(savePath) val df = spark.read.parque...
Dav asked 9/8, 2018 at 17:25

6

Solved

I want to access the first 100 rows of a spark data frame and write the result back to a CSV file. Why is take(100) basically instant, whereas df.limit(100) .repartition(1) .write .mode(SaveMode...
Warrenwarrener asked 19/10, 2017 at 14:31

5

Solved

I have my timestamp in UTC and ISO8601, but using Structured Streaming, it gets automatically converted into the local time. Is there a way to stop this conversion? I would like to have it in UTC. ...

10

I have the following json format : {"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000", "Curr": "USD" }, { "TrancheId": "500213369", "OwnedAmt": "41000000","C...
Bottle asked 17/11, 2016 at 17:56

4

Solved

Given Table 1 with one column "x" of type String. I want to create Table 2 with a column "y" that is an integer representation of the date strings given in "x". Essential is to keep null values in...

9

Solved

I read data from a csv file ,but don't have index. I want to add a column from 1 to row's number. What should I do,Thanks (scala)
Smash asked 14/4, 2017 at 7:9

4

I was looking at the DataFrame API, i can see two different methods doing the same functionality for removing duplicates from a data set. I can understand dropDuplicates(colNames) will remove dupl...
Defamatory asked 27/2, 2016 at 7:22

12

Solved

I'm trying to filter a PySpark dataframe that has None as a row value: df.select('dt_mvmt').distinct().collect() [Row(dt_mvmt=u'2016-03-27'), Row(dt_mvmt=u'2016-03-28'), Row(dt_mvmt=u'2016-03-2...
Bookman asked 16/5, 2016 at 20:31

4

Solved

I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). I get an error when I use withColumn as follows: dt.withColumn('new_column', 10).head(5) --------...
Holcomb asked 25/9, 2015 at 18:17

2

Solved

I'd like to pass a string to spark.sql Here is my query mydf = spark.sql("SELECT * FROM MYTABLE WHERE TIMESTAMP BETWEEN '2020-04-01' AND '2020-04-08') I'd like to pass a string for the date. ...
Karlotte asked 15/5, 2020 at 20:22

4

Solved

I have a json file: { "a": { "b": 1 } } I am trying to read it: val path = "D:/playground/input.json" val df = spark.read.json(path) df.show() But getting an erro...
Aldredge asked 11/8, 2019 at 16:27

4

Solved

Suppose I have an array column group_ids +-------+----------+ |user_id|group_ids | +-------+----------+ |1 |[5, 8] | |3 |[1, 2, 3] | |2 |[1, 4] | +-------+----------+ Schema: root |-- user_id: in...

7

Solved

I'm sorry if this is a stupid question, but I can't seem to get my head around it. I'm fairly new to SQL and this behavior would be strange in R or Pandas or other things that I'm used to using. B...
Deformity asked 25/7, 2017 at 19:18

4

Solved

I have json data in various json files And the keys could be different in lines, for eg {"a":1 , "b":"abc", "c":"abc2", "d":"abc3"} {"a":1 , "b":"abc2", "d":"abc"} {"a":1 ,"b":"abc", "c":"abc2", "...
Evelinevelina asked 1/3, 2017 at 8:16

3

I have a Spark dataframe which has 1 row and 3 columns, namely start_date, end_date, end_month_id. I want to retrieve the value from first cell into a variable and use that variable to filter anoth...
Pasargadae asked 2/3, 2019 at 0:9

5

Solved

I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0 for the corresponding row where it...
Trutko asked 8/6, 2016 at 15:51

2

Solved

I'm trying to use Spark 1.4 window functions in pyspark 1.4.1 but getting mostly errors or unexpected results. Here is a very simple example that I think should work: from pyspark.sql.window impo...
Cardiganshire asked 3/9, 2015 at 13:14

© 2022 - 2024 — McMap. All rights reserved.