apache-spark-1.4 Questions

6

I am trying to effectively join two DataFrames, one of which is large and the second is a bit smaller. Is there a way to avoid all this shuffling? I cannot set autoBroadCastJoinThreshold, because...

2

Solved

I am running a Spark streaming application with 2 workers. Application has a join and an union operations. All the batches are completing successfully but noticed that shuffle spill metrics are no...
Amari asked 12/6, 2015 at 7:36

3

I am running spark streaming 1.4.0 on Yarn (Apache distribution 2.6.0) with java 1.8.0_45 and also Kafka direct stream. I am also using spark with scala 2.11 support. The issue I am seeing is that...
Coparcenary asked 13/7, 2015 at 18:1

3

I am using Spark 1.4.1. I can use spark-submit without problem. But when I ran ~/spark/bin/spark-shell I got the error below I have configured SPARK_HOME and JAVA_HOME. However, It was OK with Spa...
Malposition asked 8/10, 2015 at 2:45

2

Solved

I'm trying to install Spark on my local machine. I have been following this guide. I have installed JDK-7 (also have JDK-8) and Scala 2.11.7. A problem occurs when I try to use sbt to build Spark 1...
Floatplane asked 26/7, 2015 at 13:53

0

I wrote a custom transformer like it is described here. When creating a pipeline with my transformer as first step I am able to train a (Logistic Regression) model for classification. However, wh...

3

Solved

I am new to Apache Spark (version 1.4.1). I wrote a small code to read a text file and stored its data in Rdd . Is there a way by which I can get the size of data in rdd . This is my code : im...
Unwitnessed asked 24/8, 2015 at 9:52

1

Solved

My project has unit tests for different HiveContext configurations (sometimes they are in one file as they are grouped by features.) After upgrading to Spark 1.4 I encounter a lot of 'java.sql.SQL...
Countermand asked 24/8, 2015 at 23:49

2

Solved

I have a SparkSQL DataFrame. Some entries in this data are empty but they don't behave like NULL or NA. How could I remove them? Any ideas? In R I can easily remove them but in sparkR it say tha...
Fetterlock asked 23/7, 2015 at 21:46
1

© 2022 - 2024 — McMap. All rights reserved.