Get "embedded nul(s) found in input" when reading a csv using read.csv()
Asked Answered
B

7

51

I was reading in a csv file.

Code is:

mydata = read.csv("mycsv.csv", header=True, sep=",", quote="\"")

Get the following warning:

Warning message: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input

Now some cells in my CSV have missing values that are represented by "".

How do I write this code so that I do not get the above warning?

Buie answered 22/4, 2014 at 2:42 Comment(2)
Does this opencsv bug report : sourceforge.net/p/opencsv/bugs/96 : look like it might have led to your CSV file having nulls? If that's not it and you're on a Linux system, tr -d '\000' < filein > fileout will remove the nulls, but that might not fully fix your issue.Commencement
MMMmm I'll check ... good findBuie
C
61

Your CSV might be encoded in UTF-16. This isn't uncommon when working with some Windows-based tools.

You can try loading a UTF-16 CSV like this:

read.csv("mycsv.csv", ..., fileEncoding="UTF-16LE")
Carolyn answered 22/4, 2014 at 2:45 Comment(5)
Thanks but that didn't file it ... I am pretty sure I'm not dealing with a UTF-16LE fileBuie
@user1172468: Have you tried looking at the file in a hexeditor? There might be embedded NULs, I guess. What program generated your CSVs?Carolyn
I got the following: Warning messages: 1: In read.table(file = file, header = header, sep = sep, quote = quote, : invalid input found on input connection 'mycsv.csv' 2: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'mycsv.csv'Buie
I generated it using opencsv in Java -- I'm pretty confident that there are no utf-16 characters in the file -- but I could always be wrongBuie
I don't know what is "UTF-16LE", but it helped me!!Dagall
D
41

You can try using the skipNul = TRUE option.

mydata = read.csv("mycsv.csv", quote = "\"", skipNul = TRUE)

From ?read.csv

Embedded nuls in the input stream will terminate the field currently being read, with a warning once per call to scan. Setting skipNul = TRUE causes them to be ignored.

It worked for me.

Duthie answered 25/5, 2015 at 18:13 Comment(5)
@Richard, @Apex, or someone, will you please point me to a resource or 1) define an "embedded nul" and 2) explain in more detail what skipNul = TRUE does? Thanks.Ceremonious
Null is ASCII value 0 (Hx0), called NUL or null (check any ASCII table). A (manipulated or converted) string can contain these characters. Sometime they are rendered as \0, as in ABC\0EFG. SkipNul = TRUE ignores them.Cotta
@Cotta thanks for the feedback. I'm inferring R thinks embedded nulls should become NAs; however, because the argument for na.strings = <input> is unclear or insufficient to convert all embedded nulls to NAs, R leaves the remainders as text strings with their source value. Correct? If so, is there any way to determine which data points R ignored as embedded nulls? (The goal being to convert them to NAs.) Thanks, again.Ceremonious
In my case (the output from a machine) skipNul was fine.Syrupy
But with the output of another machine, I had to use UTF-16LE encoding.Syrupy
B
4

This is nothing to do with the encoding. This is the problem with reading of the nulls in the file. To handle that, you need to pass the skipNul = TRUE paramater.

for example:

neg = scan('F:/Natural_Language_Processing/negative-words.txt', what = 'character', comment.char = '', encoding = "UTF-8", skipNul = TRUE)

Butterfish answered 23/6, 2018 at 10:28 Comment(0)
M
2

Might be a file that do not have CRLF, might only have LF. Try to check the HEX output of the file.

If so. Try running the file through awk:

awk '{printf "%s\r\n", $0}' file > new_log_file
Manisa answered 3/12, 2014 at 4:14 Comment(0)
C
1

I had the same error message and figured out that although my files had a .csv extensions and opened up with no problems in a spreadsheet, they were actually saved as ¨All Formats¨ rather than ¨Text CSV (.csv)¨

Claudio answered 15/2, 2015 at 18:25 Comment(0)
O
1

Another quick solution:

Double check that you are, in fact, reading a .csv file!

I was accidentally reading a .rds file instead of .csv and got this "embedded null" error.

Orelia answered 2/4, 2019 at 22:42 Comment(0)
B
0

In those cases be sure the data you are importing does not have "#" characters but if that the case try using the option comment.char="". It worked for me.

Buller answered 17/12, 2017 at 18:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.