apache-spark-1.3

apache-spark-1.3 Questions

Pyspark dataframe: Summing over a column while grouping over another

I have a dataframe such as the following In [94]: prova_df.show() order_item_order_id order_item_subtotal 1 299.98 2 199.99 2 250.0 2 129.99 4 49.98 4 299.95 4 150.0 4 199.92 5 299.98 5...

python pyspark apache-spark-sql apache-spark-1.3

Bedcover asked 27/11, 2015 at 16:57

Solved

Why is "Error communicating with MapOutputTracker" reported when Spark tries to send GetMapOutputStatuses?

I'm using Spark 1.3 to do an aggregation on a lot of data. The job consists of 4 steps: Read a big (1TB) sequence file (corresponding to 1 day of data) Filter out most of it and get about 1GB of ...

scala apache-spark-1.3

Dorindadorine asked 9/9, 2015 at 18:51

Scope of 'spark.driver.maxResultSize'

I'm running a Spark job to aggregate data. I have a custom data structure called a Profile, which basically contains a mutable.HashMap[Zone, Double]. I want to merge all profiles that share a given...

scala apache-spark apache-spark-1.3

Monohydric asked 11/9, 2015 at 18:47

Solved

How to view the logs of a spark job after it has completed and the context is closed?

I am running pyspark, spark 1.3, standalone mode, client mode. I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the confi...

apache-spark ssh pyspark tunneling apache-spark-1.3

Hedvah asked 15/7, 2016 at 21:48

running tasks in parallel on separate Hive partitions using Scala and Spark to speed up loading Hive and writing results to Hive or Parquet

this question is a spin off from [this one] (saving a list of rows to a Hive table in pyspark). EDIT please see my update edits at the bottom of this post I have used both Scala and now Pyspark t...

scala python-2.7 hive pyspark apache-spark-1.3

Parget asked 28/4, 2016 at 19:44

Solved

GenericRowWithSchema exception in casting ArrayBuffer to HashSet in DataFrame to RDD from Hive table

I have a Hive table in parquet format that was generated using create table myTable (var1 int, var2 string, var3 int, var4 string, var5 array<struct<a:int,b:string>>) stored as parquet;...

scala apache-spark hive apache-spark-sql apache-spark-1.3

Mccully asked 22/9, 2015 at 21:48

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

apache-spark-1.3 Questions

Recommended topics

Hot tags