data.table::fread and Unbalanced "
Asked Answered
A

1

4

When I tried to read a csv file using data.table:fread(fn, sep='\t', header=T), it gives an "Unbalanced " observed on this line" error. The data has 3 integer variables and 1 string variable. The strings in the csv file are not enclosed with ", and yes there are some lines that contains " within the string variable and the " characters are not in pairs.

I am wondering is it possible to let fread just ignore the unpaired " in the variable and continue reading data? Thanks.

Here is the sample data(just one record)

N_ID    VISIT_DATE  REQ_URL REQType
175931  2013-3-8 23:40:30   http://aaa.com/rest/api2.do?api=getSetMobileSession&data={"imei":"60893ZTE-CN13cd","appkey":"android_client","content":"Z0JiRA0qPFtWM3BYVltmcx5MWF9ZS0YLdW1ydXoqPycuJS8idXdlY3R0TGBtU   1
Aurochs answered 18/4, 2013 at 22:18 Comment(3)
Can you please add the first lines of your file to the question? Note that fread is still under development and embedded quotes ("\"" and """") have problems...Lavonnelaw
without reproducing your error there's little we can help with (unless one has experienced the exact problem you're facing).Manilla
I have added the sample record. Please verify. ThanksAurochs
P
6

UPDATE: Now implemented in v1.8.11

From NEWS :

fread now accepts quotes (both ' and ") in the middle of fields, whether the field starts with " or not, rather than the 'unbalanced quotes' error, #2694. Thanks to baidao for reporting. It was known and documented at the top of ?fread (text now removed). If a field starts with " it must end with " (necessary if the field separator itself is in the field contents). Embedded quotes can be in column names too. Newlines (\n) still can't be in quoted fields or quoted column names, yet.


Yes as @agstudy said, embedded quotes are a known documented problem not yet implemented since fread is new. Strictly speaking, I suppose these ones aren't embedded because the string in your example doesn't start with a quote, though.

Anyway, I've filed this as a bug report so it doesn't get forgotten. To be done in the next release. Thanks for highlighting.

#2694 : Strings including quotes but not starting with quote in fread

Percheron answered 19/4, 2013 at 11:46 Comment(2)
Has this been fixed? I'm having a similar issue processing tweets, I believe the tweet_text fields have \n characters that should be ignored.Pahl
@Pahl Did you search README and did you test? If still a problem please find and +1 (or raise a new) GitHub issue.Percheron

© 2022 - 2024 — McMap. All rights reserved.