apache-spark-1.3 Questions

3

I have a dataframe such as the following In [94]: prova_df.show() order_item_order_id order_item_subtotal 1 299.98 2 199.99 2 250.0 2 129.99 4 49.98 4 299.95 4 150.0 4 199.92 5 299.98 5...
Bedcover asked 27/11, 2015 at 16:57

3

Solved

I'm using Spark 1.3 to do an aggregation on a lot of data. The job consists of 4 steps: Read a big (1TB) sequence file (corresponding to 1 day of data) Filter out most of it and get about 1GB of ...
Dorindadorine asked 9/9, 2015 at 18:51

1

I'm running a Spark job to aggregate data. I have a custom data structure called a Profile, which basically contains a mutable.HashMap[Zone, Double]. I want to merge all profiles that share a given...
Monohydric asked 11/9, 2015 at 18:47

1

Solved

I am running pyspark, spark 1.3, standalone mode, client mode. I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the confi...
Hedvah asked 15/7, 2016 at 21:48

0

this question is a spin off from [this one] (saving a list of rows to a Hive table in pyspark). EDIT please see my update edits at the bottom of this post I have used both Scala and now Pyspark t...
Parget asked 28/4, 2016 at 19:44

1

Solved

I have a Hive table in parquet format that was generated using create table myTable (var1 int, var2 string, var3 int, var4 string, var5 array<struct<a:int,b:string>>) stored as parquet;...
Mccully asked 22/9, 2015 at 21:48
1

© 2022 - 2024 — McMap. All rights reserved.