Converting string/chr to date using sparklyr - McMap

About

Converting string/chr to date using sparklyr

Asked 27/9, 2017 at 17:13 Answered 28/9, 2017 at 21:29

r apache-spark hive dplyr sparklyr

R

1

5

I've brought a table into Hue which has a column of dates and i'm trying to play with it using sparklyr in Rstudio.

I'd like to convert a character column into a date column like so:

Weather_data = mutate(Weather_data, date2 = as.Date(date, "%m/%d/%Y"))

and this runs fine but when i check:

head(Weather_data)

How to I properly convert the chr to dates?

Thanks!!!!

Redfaced answered 27/9, 2017 at 17:13 Comment(5)

Can you provide the data in dput(Weather_data) format before you applied the conversion? – Arlettaarlette 27/9, 2017 at 17:15

Thank you for your answer. I'm sorry, i'm not the best with R or using hue so I'm not quite sure what you mean. But when I brought the data into hue, I tried to select the column type as date but when it showed the output, it was all NAs so I used string instead – Redfaced 27/9, 2017 at 17:23

sdf_mutate(Weather_data, date2 = as.Date(date, "%m/%d/%Y")) – Hoggard 27/9, 2017 at 17:51

Thank you for your message. THis gave me the error: – Redfaced 27/9, 2017 at 18:36

Error in as.Date.default(list(src = list(con = list(master = "yarn-client", : c("do not know how to convert 'structure(list(src = structure(list(con = structure(list(master = \"yarn-client\", ' to class “Date”", "do not know how to convert ' method = \"shell\", app_name = \"sparklyr\", config....... – Redfaced 27/9, 2017 at 18:37

L

8

The problem is that sparklyr doesn't correctly support Spark DateType. It is possible to parse dates, and correct format, but not represent these as proper DateType columns. If that's enough then please follow the instructions below.

In Spark 2.2 or later use to_date with Java SimpleDataFormat compatible string:

df <- copy_to(sc, data.frame(date=c("01/01/2010")))
parsed <- df %>% mutate(date_parsed = to_date(date, "MM/dd/yyyy"))
parsed

# Source:   lazy query [?? x 2]
# Database: spark_connection
        date date_parsed
       <chr>       <chr>
1 01/15/2010  2010-01-15

Interestingly internal Spark object still uses DateType columns:

parsed %>% spark_dataframe

<jobj[120]>
  class org.apache.spark.sql.Dataset
  [date: string, date_parsed: date]

For earlier versions unix_timestamp and cast (but watch for possible timezone problems):

df %>%
  mutate(date_parsed = sql(
    "CAST(CAST(unix_timestamp(date, 'MM/dd/yyyy') AS timestamp) AS date)"))

# Source:   lazy query [?? x 2]
# Database: spark_connection
        date date_parsed
       <chr>       <chr>
1 01/15/2010  2010-01-15

Edit:

It looks like this problem has been resolved on current master (sparklyr_0.7.0-9105):

# Source:   lazy query [?? x 2]
# Database: spark_connection
        date date_parsed
       <chr>      <date>
1 01/01/2010  2009-12-31

Lionfish answered 28/9, 2017 at 21:29 Comment(4)

beware that the timezone default in the JVM will play dirty tricks here! #49355577 – Religion 19/4, 2018 at 14:31

@ℕʘʘḆḽḘ In practice timezone should be always configured explicitly for the Spark session - otherwise things can get ugly pretty fast (which much more serious outcomes than the one linked, which is actually intended and documented :). So you either need extraJavaOptions or respective Spark SQL config. – Lionfish 19/4, 2018 at 15:10

great! do you know what would be the correct command with javaextraoptions to set UTC? – Religion 19/4, 2018 at 15:18

Old way --conf "spark.driver.extraJavaOptions=-Duser.timezone=UTC" --conf "spark.executor.extraJavaOptions=-Duser.timezone=UTC" or if you use spark-submit or equivalent in Spark configuration. New way (2.2+) --conf spark.sql.session.timeZone=UTC. A few answers (stackoverflow.com/…) if you want to give a credit to some user :) The possible gotcha is that things get weird, if for some crazy reason different nodes have different settings :) – Lionfish 19/4, 2018 at 15:21

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.