spark-csv Questions
3
How to determine a dataframe size?
Right now I estimate the real size of a dataframe as follows:
headers_size = key for key in df.first().asDict()
rows_size = df.map(lambda row: len(value for key...
Veratridine asked 6/5, 2016 at 16:38
3
Solved
I'm getting the error message
java.lang.IllegalArgumentException: Schema must be specified when creating a streaming source DataFrame. If some files already exist in the directory, then depending o...
Postfree asked 17/10, 2021 at 19:56
4
Solved
I know how to read a CSV file into Apache Spark using spark-csv, but I already have the CSV file represented as a string and would like to convert this string directly to dataframe. Is this possibl...
Foliage asked 23/8, 2016 at 22:53
16
I am using https://github.com/databricks/spark-csv , I am trying to write a single CSV, but not able to, it is making a folder.
Need a Scala function which will take parameter like path and file n...
Garver asked 28/7, 2015 at 11:8
1
Solved
Hope everyone is doing well.
While going through the spark csv datasource options for the question I am quite confused on the difference between the various quote related options available.
Do we...
Bortz asked 14/11, 2022 at 8:7
17
Solved
I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("...
Rutkowski asked 16/11, 2015 at 19:17
7
Solved
I have a big distributed file on HDFS and each time I use sqlContext with spark-csv package, it first loads the entire file which takes quite some time.
df = sqlContext.read.format('com.databricks...
Larva asked 31/5, 2017 at 6:15
13
I am trying to read a csv file into a dataframe. I know what the schema of my dataframe should be since I know my csv file. Also I am using spark csv package to read the file. I trying to specify t...
Gord asked 7/10, 2016 at 22:2
3
Solved
dataFrame.coalesce(1).write().save("path") sometimes writes only _SUCCESS and ._SUCCESS.crc files without an expected *.csv.gz even on non-empty input DataFrame
file save code:
private static voi...
Redress asked 16/10, 2019 at 5:45
2
Solved
Terribly new to spark and hive and big data and scala and all. I'm trying to write a simple function that takes an sqlContext, loads a csv file from s3 and returns a DataFrame. The problem is that ...
Townes asked 15/3, 2016 at 9:47
2
Solved
I have a CSV that looks like this:
+-----------------+-----------------+-----------------+
| Column One | Column Two | Column Three |
+-----------------+-----------------+-----------------+
| This...
Sealed asked 22/5, 2018 at 23:33
1
Solved
I am trying to add a UUID column to my dataset.
getDataset(Transaction.class)).withColumn("uniqueId", functions.lit(UUID.randomUUID().toString())).show(false);
But the result is all the ro...
Garry asked 9/4, 2018 at 14:57
2
I am only trying to read a textfile into a pyspark RDD, and I am noticing huge differences between sqlContext.read.load and sqlContext.read.text.
s3_single_file_inpath='s3a://bucket-name/file_nam...
Uro asked 5/12, 2017 at 2:11
1
Solved
I have a data frame where i am replacing default delimiter , with |^|.
it is working fine and i am getting the expected result also except where , is found in the records .
For example i have one s...
Joan asked 29/10, 2017 at 16:15
2
Solved
How to force inferSchema for CSV to consider integers as dates (with "dateFormat" option)?
I use Spark 2.2.0
I am reading a csv file as follows:
val dataFrame = spark.read.option("inferSchema", "true")
.option("header", true)
.option("dateFormat", "yyyyMMdd")
.csv(pathToCSVFile)
T...
Palua asked 2/10, 2017 at 16:8
2
Running through the spark-csv README there's sample Java code like this import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.*;
SQLContext sqlContext = new SQLContext(sc);
St...
Gynecic asked 21/12, 2015 at 3:50
2
Solved
The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote with no success. I also used quoteMode with ALL and NON_NUMERIC options...
Papke asked 26/4, 2017 at 20:31
1
Solved
For a custom Estimator`s transformSchema method I need to be able to compare the schema of a input data frame to the schema defined in a case class. Usually this could be performed like Generate a ...
Winterwinterbottom asked 27/11, 2016 at 14:43
1
Solved
Spark Version: spark-2.0.1-bin-hadoop2.7
Scala: 2.11.8
I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of...
Roldan asked 4/11, 2016 at 23:24
3
Solved
I am importing a CSV file (using spark-csv) into a DataFrame which has empty String values. When applied the OneHotEncoder, the application crashes with error requirement failed: Cannot have an emp...
Skeptic asked 12/10, 2015 at 20:36
1
Solved
When CSV is read as dataframe in spark, all the columns are read as string. Is there any way to get the actual type of column?
I have the following csv file
Name,Department,years_of_experience,DO...
Faulk asked 30/7, 2015 at 9:8
1
© 2022 - 2024 — McMap. All rights reserved.