Custom delimiter csv reader spark

Asked 21/9, 2017 at 17:20 Answered 16/11, 2022 at 17:41

I would like to read in a file with the following structure with Apache Spark.

628344092\t20070220\t200702\t2007\t2007.1370

The delimiter is \t. How can I implement this while using spark.read.csv()?

The csv is much too big to use pandas because it takes ages to read this file. Is there some way which works similar to

pandas.read_csv(file, sep = '\t')

Thanks a lot!

Donatus answered 21/9, 2017 at 17:20 Comment(0)

101

Use spark.read.option("delimiter", "\t").csv(file) or sep instead of delimiter.

If it's literally \t, not tab special character, use double \: spark.read.option("delimiter", "\\t").csv(file)

Therese answered 21/9, 2017 at 17:21 Comment(5)

Is there any website to check the documentation of spark.read or anything else? Thanks for the answer! :) – Donatus 21/9, 2017 at 17:41

CSV supports is a merge of this project: github.com/databricks/spark-csv It has some documentation. I'm personally just checking the code :) – Alcmene 21/9, 2017 at 17:43

What's the difference between sep and delimiter? – Signorino 13/9, 2018 at 19:21

@Signorino None, both means the same :) – Alcmene 13/9, 2018 at 20:31

This changed in Spark now, with the pandas solution at the top also possible? – Banderole 29/1, 2019 at 20:48

This works for me and it is much more clear (for me): As you mentioned, in pandas you would do:

df_pandas = pandas.read_csv(file_path, sep = '\t')

In spark:

df_spark = spark.read.csv(file_path, sep ='\t', header = True)

Please note that if the first row of your csv are the column names, you should set header = False, like this:

df_spark = spark.read.csv(file_path, sep ='\t', header = False)

You can change the separator (sep) to fit your data.

Officialdom answered 21/10, 2021 at 14:27 Comment(0)

If you are using SparkSQL, you can use the DDL below with the OPTION syntax to specify your delimiter.

CREATE TABLE sample_table
USING CSV
OPTIONS ('delimiter'='\t')
AS SELECT ...

SparkSQL Documentation

Sibelle answered 16/11, 2022 at 17:41 Comment(0)

Recommended topics

Hot tags