apache-spark-1.3 Questions
3
I have a dataframe such as the following
In [94]: prova_df.show()
order_item_order_id order_item_subtotal
1 299.98
2 199.99
2 250.0
2 129.99
4 49.98
4 299.95
4 150.0
4 199.92
5 299.98
5...
Bedcover asked 27/11, 2015 at 16:57
3
Solved
I'm using Spark 1.3 to do an aggregation on a lot of data. The job consists of 4 steps:
Read a big (1TB) sequence file (corresponding to 1 day of data)
Filter out most of it and get about 1GB of ...
Dorindadorine asked 9/9, 2015 at 18:51
1
I'm running a Spark job to aggregate data. I have a custom data structure called a Profile, which basically contains a mutable.HashMap[Zone, Double]. I want to merge all profiles that share a given...
Monohydric asked 11/9, 2015 at 18:47
1
Solved
I am running pyspark, spark 1.3, standalone mode, client mode.
I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the confi...
Hedvah asked 15/7, 2016 at 21:48
0
this question is a spin off from [this one] (saving a list of rows to a Hive table in pyspark).
EDIT please see my update edits at the bottom of this post
I have used both Scala and now Pyspark t...
Parget asked 28/4, 2016 at 19:44
1
Solved
GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table
I have a Hive table in parquet format that was generated using
create table myTable (var1 int, var2 string, var3 int, var4 string, var5 array<struct<a:int,b:string>>) stored as parquet;...
Mccully asked 22/9, 2015 at 21:48
1
© 2022 - 2024 — McMap. All rights reserved.