spark-csv Questions

3

How to determine a dataframe size? Right now I estimate the real size of a dataframe as follows: headers_size = key for key in df.first().asDict() rows_size = df.map(lambda row: len(value for key...
Veratridine asked 6/5, 2016 at 16:38

3

Solved

I'm getting the error message java.lang.IllegalArgumentException: Schema must be specified when creating a streaming source DataFrame. If some files already exist in the directory, then depending o...
Postfree asked 17/10, 2021 at 19:56

4

Solved

I know how to read a CSV file into Apache Spark using spark-csv, but I already have the CSV file represented as a string and would like to convert this string directly to dataframe. Is this possibl...
Foliage asked 23/8, 2016 at 22:53

16

I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder. Need a Scala function which will take parameter like path and file n...
Garver asked 28/7, 2015 at 11:8

1

Solved

Hope everyone is doing well. While going through the spark csv datasource options for the question I am quite confused on the difference between the various quote related options available. Do we...
Bortz asked 14/11, 2022 at 8:7

17

Solved

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content: val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("...
Rutkowski asked 16/11, 2015 at 19:17

7

Solved

I have a big distributed file on HDFS and each time I use sqlContext with spark-csv package, it first loads the entire file which takes quite some time. df = sqlContext.read.format('com.databricks...
Larva asked 31/5, 2017 at 6:15

13

I am trying to read a csv file into a dataframe. I know what the schema of my dataframe should be since I know my csv file. Also I am using spark csv package to read the file. I trying to specify t...

3

Solved

dataFrame.coalesce(1).write().save("path") sometimes writes only _SUCCESS and ._SUCCESS.crc files without an expected *.csv.gz even on non-empty input DataFrame file save code: private static voi...
Redress asked 16/10, 2019 at 5:45

2

Solved

Terribly new to spark and hive and big data and scala and all. I'm trying to write a simple function that takes an sqlContext, loads a csv file from s3 and returns a DataFrame. The problem is that ...
Townes asked 15/3, 2016 at 9:47

2

Solved

I have a CSV that looks like this: +-----------------+-----------------+-----------------+ | Column One | Column Two | Column Three | +-----------------+-----------------+-----------------+ | This...
Sealed asked 22/5, 2018 at 23:33

1

Solved

I am trying to add a UUID column to my dataset. getDataset(Transaction.class)).withColumn("uniqueId", functions.lit(UUID.randomUUID().toString())).show(false); But the result is all the ro...
Garry asked 9/4, 2018 at 14:57

2

I am only trying to read a textfile into a pyspark RDD, and I am noticing huge differences between sqlContext.read.load and sqlContext.read.text. s3_single_file_inpath='s3a://bucket-name/file_nam...
Uro asked 5/12, 2017 at 2:11

1

Solved

I have a data frame where i am replacing default delimiter , with |^|. it is working fine and i am getting the expected result also except where , is found in the records . For example i have one s...
Joan asked 29/10, 2017 at 16:15

2

Solved

I use Spark 2.2.0 I am reading a csv file as follows: val dataFrame = spark.read.option("inferSchema", "true") .option("header", true) .option("dateFormat", "yyyyMMdd") .csv(pathToCSVFile) T...
Palua asked 2/10, 2017 at 16:8

2

Running through the spark-csv README there's sample Java code like this import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.*; SQLContext sqlContext = new SQLContext(sc); St...
Gynecic asked 21/12, 2015 at 3:50

2

Solved

The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote with no success. I also used quoteMode with ALL and NON_NUMERIC options...
Papke asked 26/4, 2017 at 20:31

1

Solved

For a custom Estimator`s transformSchema method I need to be able to compare the schema of a input data frame to the schema defined in a case class. Usually this could be performed like Generate a ...
Winterwinterbottom asked 27/11, 2016 at 14:43

1

Solved

Spark Version: spark-2.0.1-bin-hadoop2.7 Scala: 2.11.8 I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of...
Roldan asked 4/11, 2016 at 23:24

3

Solved

I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, the application crashes with error requirement failed: Cannot have an emp...

1

Solved

When CSV is read as dataframe in spark, all the columns are read as string. Is there any way to get the actual type of column? I have the following csv file Name,Department,years_of_experience,DO...
Faulk asked 30/7, 2015 at 9:8
1

© 2022 - 2024 — McMap. All rights reserved.