apache-spark-sql Questions
6
Solved
As mentioned in many other locations on the web, adding a new column to an existing DataFrame is not straightforward. Unfortunately it is important to have this functionality (even though it is ine...
Potluck asked 9/10, 2015 at 12:45
4
Solved
I know how to read a CSV file into Apache Spark using spark-csv, but I already have the CSV file represented as a string and would like to convert this string directly to dataframe. Is this possibl...
Foliage asked 23/8, 2016 at 22:53
7
Solved
How to get current_date - 1 day in sparksql, same as cur_date()-1 in mysql.
Haase asked 13/12, 2016 at 6:28
6
I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception
org.apache.spark.SparkException: Job aborted ...
Revolt asked 16/3, 2016 at 11:52
7
Solved
I am interested in being able to retrieve the location value of a Hive table given a Spark object (SparkSession). One way to obtain this value is by parsing the output of the location via the follo...
Typebar asked 6/1, 2019 at 10:27
15
Solved
With pyspark dataframe, how do you do the equivalent of Pandas df['col'].unique().
I want to list out all the unique values in a pyspark dataframe column.
Not the SQL type way (registertemplate the...
Alphorn asked 8/9, 2016 at 6:3
5
Does anyone how to do pagination in spark sql query?
I need to use spark sql but don't know how to do pagination.
Tried:
select * from person limit 10, 10
Wilmer asked 24/3, 2015 at 8:29
2
Solved
I have a multi-column pyspark dataframe, and I need to convert the string types to the correct types, for example:
I'm doing like this currently
df = df.withColumn(col_name, col(col_name).cast('flo...
Coronograph asked 9/7, 2021 at 21:1
3
Checkpoint version:
val savePath = "/some/path"
spark.sparkContext.setCheckpointDir(savePath)
df.checkpoint()
Write to disk version:
df.write.parquet(savePath)
val df = spark.read.parque...
Dav asked 9/8, 2018 at 17:25
6
Solved
I want to access the first 100 rows of a spark data frame and write the result back to a CSV file.
Why is take(100) basically instant, whereas
df.limit(100)
.repartition(1)
.write
.mode(SaveMode...
Warrenwarrener asked 19/10, 2017 at 14:31
5
Solved
I have my timestamp in UTC and ISO8601, but using Structured Streaming, it gets automatically converted into the local time. Is there a way to stop this conversion? I would like to have it in UTC.
...
Dice asked 13/2, 2018 at 12:37
10
I have the following json format :
{"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000", "Curr": "USD" }, { "TrancheId": "500213369", "OwnedAmt": "41000000","C...
Bottle asked 17/11, 2016 at 17:56
4
Solved
Given Table 1 with one column "x" of type String.
I want to create Table 2 with a column "y" that is an integer representation of the date strings given in "x".
Essential is to keep null values in...
Rewarding asked 2/9, 2015 at 15:25
9
Solved
I read data from a csv file ,but don't have index.
I want to add a column from 1 to row's number.
What should I do,Thanks (scala)
Smash asked 14/4, 2017 at 7:9
4
I was looking at the DataFrame API, i can see two different methods doing the same functionality for removing duplicates from a data set.
I can understand dropDuplicates(colNames) will remove dupl...
Defamatory asked 27/2, 2016 at 7:22
12
Solved
I'm trying to filter a PySpark dataframe that has None as a row value:
df.select('dt_mvmt').distinct().collect()
[Row(dt_mvmt=u'2016-03-27'),
Row(dt_mvmt=u'2016-03-28'),
Row(dt_mvmt=u'2016-03-2...
Bookman asked 16/5, 2016 at 20:31
4
Solved
I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). I get an error when I use withColumn as follows:
dt.withColumn('new_column', 10).head(5)
--------...
Holcomb asked 25/9, 2015 at 18:17
2
Solved
I'd like to pass a string to spark.sql
Here is my query
mydf = spark.sql("SELECT * FROM MYTABLE WHERE TIMESTAMP BETWEEN '2020-04-01' AND '2020-04-08')
I'd like to pass a string for the date.
...
Karlotte asked 15/5, 2020 at 20:22
4
Solved
I have a json file:
{
"a": {
"b": 1
}
}
I am trying to read it:
val path = "D:/playground/input.json"
val df = spark.read.json(path)
df.show()
But getting an erro...
Aldredge asked 11/8, 2019 at 16:27
4
Solved
Suppose I have an array column group_ids
+-------+----------+
|user_id|group_ids |
+-------+----------+
|1 |[5, 8] |
|3 |[1, 2, 3] |
|2 |[1, 4] |
+-------+----------+
Schema:
root
|-- user_id: in...
Fanfani asked 19/3, 2021 at 22:53
7
Solved
I'm sorry if this is a stupid question, but I can't seem to get my head around it. I'm fairly new to SQL and this behavior would be strange in R or Pandas or other things that I'm used to using.
B...
Deformity asked 25/7, 2017 at 19:18
4
Solved
I have json data in various json files And the keys could be different in lines, for eg
{"a":1 , "b":"abc", "c":"abc2", "d":"abc3"}
{"a":1 , "b":"abc2", "d":"abc"}
{"a":1 ,"b":"abc", "c":"abc2", "...
Evelinevelina asked 1/3, 2017 at 8:16
3
I have a Spark dataframe which has 1 row and 3 columns, namely start_date, end_date, end_month_id. I want to retrieve the value from first cell into a variable and use that variable to filter anoth...
Pasargadae asked 2/3, 2019 at 0:9
5
Solved
I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0 for the corresponding row where it...
Trutko asked 8/6, 2016 at 15:51
2
Solved
Why do Window functions fail with "Window function X does not take a frame specification"?
I'm trying to use Spark 1.4 window functions in pyspark 1.4.1
but getting mostly errors or unexpected results.
Here is a very simple example that I think should work:
from pyspark.sql.window impo...
Cardiganshire asked 3/9, 2015 at 13:14
© 2022 - 2024 — McMap. All rights reserved.