In R I have a spark connection and a DataFrame as ddf
.
library(sparklyr)
library(tidyverse)
sc <- spark_connect(master = "foo", version = "2.0.2")
ddf <- spark_read_parquet(sc, name='test', path="hdfs://localhost:9001/foo_parquet")
Since it's not a whole lot of rows I'd like to pull this into memory to apply some machine learning magic. However, it seems that certain rows cannot be collected.
df <- ddf %>% head %>% collect # works fine
df <- ddf %>% collect # doesn't work
The second line of code throws a Error in rawToChar(raw) : embedded nul in string:
error. The column/row it fails on has some string data. Since head %>% collect
works indicates that some rows seem to fail while others work as expected.
How can I work around this error, is there a way to clean up the error? What does the error actually mean?