Specifying col type in Sparklyr (spark_read_csv)

Asked 24/3, 2017 at 15:17 Answered 6/10, 2017 at 19:20

I am reading in a csv into spark using SpraklyR

schema <- structType(structField("TransTime", "array<timestamp>", TRUE),
                 structField("TransDay", "Date", TRUE))

 spark_read_csv(sc, filename, "path", infer_schema = FALSE, schema = schema)

But get:

Error: could not find function "structType"

How do I specify colunm types using spark_read_csv?

Thanks in advance.

Flak answered 24/3, 2017 at 15:17 Comment(0)

The structType function is from Scala's SparkAPI, in Sparklyr to specify the datatype you must pass it in the "column" argument as a list, suppose that we have the following CSV(data.csv):

name,birthdate,age,height
jader,1994-10-31,22,1.79
maria,1900-03-12,117,1.32

The function to read the corresponding data is:

mycsv <- spark_read_csv(sc, "mydate", 
                          path =  "data.csv", 
                          memory = TRUE,
                          infer_schema = FALSE, #attention to this
                          columns = list(
                            name = "character",
                            birthdate = "date", #or character because needs date functions
                            age = "integer",
                            height = "double"))
# integer = "INTEGER"
# double = "REAL"
# character = "STRING"
# logical = "INTEGER"
# list = "BLOB"
# date = character = "STRING" # not sure

For manipulating datetype you must use the hive date functions, not R functions.

mycsv %>% mutate(birthyear = year(birthdate))

Reference: https://spark.rstudio.com/articles/guides-dplyr.html#hive-functions

Petrie answered 6/10, 2017 at 19:20 Comment(2)

Any ideas about bigint / int64 /long? – Francenefrances 18/12, 2017 at 16:1

@Francenefrances the translation is done here github.com/rstudio/sparklyr/blob/… as you can see no long type. – Petrie 18/2, 2018 at 14:48

There is an example of how to do that and the details explained in the free online book about sparklyr https://therinspark.com/data.html

but the named list example in Jader Martins' answer is simpler

Bealle answered 24/3, 2017 at 15:21 Comment(2)

404 - Dead link – Francenefrances 18/12, 2017 at 16:2

link fixed now to point to official sparklyr book – Usurp 7/2 at 19:35

Recommended topics

Hot tags