Sparklyr "embedded nul in string" when collecting
Asked Answered
I

0

7

In R I have a spark connection and a DataFrame as ddf.

library(sparklyr)
library(tidyverse)
sc <- spark_connect(master = "foo", version = "2.0.2")
ddf <- spark_read_parquet(sc, name='test', path="hdfs://localhost:9001/foo_parquet")

Since it's not a whole lot of rows I'd like to pull this into memory to apply some machine learning magic. However, it seems that certain rows cannot be collected.

df <- ddf %>% head %>% collect # works fine
df <- ddf %>% collect # doesn't work

The second line of code throws a Error in rawToChar(raw) : embedded nul in string: error. The column/row it fails on has some string data. Since head %>% collect works indicates that some rows seem to fail while others work as expected.

How can I work around this error, is there a way to clean up the error? What does the error actually mean?

Iives answered 20/2, 2017 at 9:38 Comment(1)
What are the data types of the columns in Spark? And can you provide sample data?Prom

© 2022 - 2024 — McMap. All rights reserved.