Crashing R when calling `write.table` on particular data set
Asked Answered
R

2

18

The following consistently crashes my R session.
Tested on two machines, Ubuntu and Mac OS X with similar results on both.

Brief Description:
Calling write.table on a data.frame with factor column of all NA's.

The original data set is rather large, and I've managed to isolate the offending column and then create a similar vector, named PROBLEM_DATA below, which causes the same crash.

Interestingly, sometimes R crashes outright, othertimes it simply throws the following error:

Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote),  : 
  'getCharCE' must be called on a CHARSXP

Any thoughts as to the cause of the crash or should it be submitted as a bug?

Offending data and call:

PROBLEM_DATA <- structure(114:116, .Label = c("String1", "String2", "String3", "String4", "String5", "String6", 
                   "String7", "String8", "String9", "String10", "String11", "String12", "String13", "String14", "String15"), class = "factor")

# This will cause a crash
write.table(PROBLEM_DATA, file=path.expand("~/test.csv"))

# This will also crash
write.table(PROBLEM_DATA, file=path.expand("~/test.csv"), fileEncoding="UTF-8")

SESSION INFO OF EACH MACHINE

UBUNTU

R version 2.15.3 (2013-03-01)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C            LC_COLLATE=C        
 [5] LC_MONETARY=C        LC_MESSAGES=C        LC_PAPER=C           LC_NAME=C           
 [9] LC_ADDRESS=C         LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gdata_2.12.0     ggplot2_0.9.3    stringr_0.6.1    RMySQL_0.9-3     DBI_0.2-5       
[6] data.table_1.8.8

loaded via a namespace (and not attached):
 [1] MASS_7.3-23        RColorBrewer_1.0-5 colorspace_1.2-0   dichromat_1.2-4   
 [5] digest_0.5.2       grid_2.15.3        gtable_0.1.1       gtools_2.7.0      
 [9] labeling_0.1       munsell_0.4        plyr_1.7.1         proto_0.3-9.2     
[13] reshape2_1.2.1     scales_0.2.3       tools_2.15.3

Mac OS X

R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
Rune answered 4/4, 2013 at 16:21 Comment(12)
I can only report that it also happens for me with R version 3.0.0: > head(PROBLEM_DATA) [1] <NA> <NA> <NA> 15 Levels: String1 String2 String3 String4 String5 String6 String7 ... String15 > write.table(PROBLEM_DATA, file=path.expand("~/test.csv")) *** caught segfault *** address 0x1, cause 'memory not mapped' Traceback: 1: write.table(PROBLEM_DATA, file = path.expand("~/test.csv")) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Theocritus
Same here on 2.15.1 on Linux - also happens with a smaller vector PD=structure(11:12,.Label=c("Foo","Bar"),class="factor"). I say check the changelog and nightly R and then report as a bug.Quinquevalent
Same problem (alternating crashing and complaining about 'getCharCE') on R 2.15.2 64 bits @ Win7Expressivity
Same here on Win7 R64 2.15.3 . It worked first time; output file contained column named "x" with values 78,79, 80 . Next time, got error message, file contained "x" and the value 1. Can all you commenters post what output, if any, showed up?Spavined
I'm guessing the fact that you force levels 114:116 to exist but only define labels for levels 1:15 has a lot to do with it. Take a look at as.numeric(PROBLEM_DATA) as well as as.numeric(as.character(PROBLEM_DATA)) (per the R_FAQ). You end up with a bunch of levels which have the same (nonexistent) name.Spavined
I don't get a crash under vanilla R, but I did get one after having loaded the gdata package. Perhaps related to https://mcmap.net/q/371864/-reordering-factor-gives-different-results-depending-on-which-packages-are-loaded/892313 ? In general, crashes are always bugs. The question is if it is a bug in gdata or base R.Coeval
I am also able to reproduce a crash on R-3.0.0, but now it is giving the above error message sometimes... these are big tables with a 1.1 to 1.8 million rows. I'm using --vanilla and it still crashes... very frustrating - It was working fine until this morning - I don't know what has changed at all!Annunciata
ah. for me it seems to be related to write.csv() on a data.table rather on a plain data.frameAnnunciata
@Sean, very interesting. Did you use rbindlist at some point in creating the larget DT?Rune
@RicardoSaporta yes indeed I did... I can't reproduce it for tiny data.tables though, even with rbindlist()Annunciata
@Annunciata Can you post your code to a new question? (you can link back to this one)Rune
@RicardoSaporta I posted as requested at #16316051Annunciata
R
9

This is a nice reproducible bug and should be reported to R-devel or using bug.report(). FWIW on

> sessionInfo()
R version 3.0.0 Patched (2013-04-03 r62485)
Platform: x86_64-unknown-linux-gnu (64-bit)

If on Linux I configure R with CFLAGS="-g -O0" I can

R -d gdb
(gdb) break Rf_error
(gdb) run

then paste your lines above and end up at

> write.table(PROBLEM_DATA, file=path.expand("~/test.csv"))

Breakpoint 1, Rf_error (format=0x7ffff7a8f0f0 "'%s' must be called on a CHARSXP") at /home/mtmorgan/src/R-3-0-branch/src/main/errors.c:753
753     RCNTXT *c = R_GlobalContext;
(gdb) up 3
#3  0x00007ffff1b9bfb3 in EncodeElement2 (x=0x31ccf50, indx=113, quote=TRUE, qmethod=TRUE, buff=0x7fffffffbdc0, cdec=46 '.')
    at /home/mtmorgan/src/R-3-0-branch/src/library/utils/src/io.c:938
938     p0 = translateChar(STRING_ELT(x, indx));
(gdb) call Rf_PrintValue(x)
 [1] "String1"  "String2"  "String3"  "String4"  "String5"  "String6" 
 [7] "String7"  "String8"  "String9"  "String10" "String11" "String12"
[13] "String13" "String14" "String15"
(gdb) p indx
$1 = 113

which shows R trying to print out the 114th element of the factor names -- clearly things have gone wrong because the factor has integer values beyond the length of its levels.

Romberg answered 4/4, 2013 at 21:8 Comment(0)
S
1

Not an answer, but a long commment:

PROBLEM_DATA <- structure(c(1:5,114:116), .Label = c("String1", "String2", "String3",'string4','str5','str6','str7'),class='factor')
Rgames> as.numeric(PROBLEM_DATA)
[1]   1   2   3   4   5 114 115 116
Rgames> as.numeric(as.character(PROBLEM_DATA))
[1] NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion 
Rgames> levels(PROBLEM_DATA)
[1] "String1" "String2" "String3" "string4" "str5"    "str6"    "str7"   
Rgames> write.table(PROBLEM_DATA, file=path.expand("~/ctest.csv"))
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote),  : 
  'getCharCE' must be called on a CHARSXP

ctest.csv contains: (each line is a single cell so far as Excel is concerned)

x
1 "String1"
2 "String2"
3 "String3"
4 "string4"
5 "str5"
6

So you can see something going bad when there's a 'gap' in the levels' underlying numbering. Hope this provides a clue to someone who understands factors a lot more than I do.

Spavined answered 4/4, 2013 at 18:20 Comment(1)
interesting point... I'll dig into that a bit more. Perhaps we can throw a catch in there thoughRune

© 2022 - 2024 — McMap. All rights reserved.