Getting a random internal selfref error in data.table for R
Asked Answered
A

3

10

I love data.table, it's fast and intuitive, what could be better? Alas, here's my problem: when referring to a data.table within a foreach() loop (using the doMC implementation) I will occasionally get the following error: EXAMPLE IN APPENDIX

Error in { : 
  Internal error: .internal.selfref prot is not itself an extptr

One of the annoying problems here is that I can't get it to reproduce with any consistency, but it will happen during some long (several hrs) tasks, so I want to make sure it never happens, if possible.

Since I refer to the same data.table, DT, in each loop, I tried running the following at the beginning of each loop:

setattr(DT,".internal.selfref",NULL)   

...to remove the invalid/corrupted self ref attribute. This works and the internal selfref error no longer occurs. It's a workaround, though.

Any ideas for addressing the root problem?

Many thanks for any help!

Eric

Appendix: Abbreviated R Session Info to confirm latest versions:

R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
other attached packages:
 [1] data.table_1.8.8  doMC_1.3.0

Example using simulated data -- you may have to run the history() function many times (like, hundreds) to get the error:

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Load packages and Prepare Data
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require(data.table)
##this is the package we use for multicore
require(doMC)
##register n-2 of your machine's cores
registerDoMC(multicore:::detectCores()-2) 

## Build simulated data
value.a <- runif(500,0,1)
value.b <- 1-value.a
value <- c(value.a,value.b)
answer.opt <- c(rep("a",500),rep("b",500))
answer.id <- rep( 6000:6499 , 2)
question.id <- rep( sample(c(1001,1010,1041,1121,1124),500,replace=TRUE) ,2)
date <- rep( (Sys.Date() - sample.int(150, size=500, replace=TRUE)) , 2)
user.id <- rep( sample(250:350, size=500, replace=TRUE) ,2)
condition <- substr(as.character(user.id),1,1)
condition[which(condition=="2")] <- "x"
condition[which(condition=="3")] <- "y"

##Put everything in a data.table
DT.full <- data.table(user.id = user.id,
                      answer.opt = answer.opt,
                      question.id = question.id,
                      date = date,
                      answer.id = answer.id,
                      condition = condition,
                      value = value)

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Daily Aggregation Function
##
##a basic function that aggregates all the values from
##all users for every question on a given day:
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
each.day <- function(val.date){
  DT <- DT.full[ date < val.date ]

  #count the number of updates per user (for weighting)
  setkey(DT, question.id, user.id)
  DT <- DT[ DT[answer.opt=="a",length(value),by="question.id,user.id"] ]
  setnames(DT, "V1", "freq")

  #retain only the most recent value from each user on each question
  setkey(DT, question.id, user.id, answer.id)
  DT <- DT[ DT[ ,answer.id == max(answer.id), by="question.id,user.id", ][[3]] ]

  #now get a weighted mean (with freq) of the value for each question
  records <- lapply(unique(DT$question.id), function(q.id) {
    DT <- DT[ question.id == q.id ]
    probs <- DT[ ,weighted.mean(value,freq), by="answer.opt" ]
    return(data.table(q.id = rep(q.id,nrow(probs)),
                      ans.opt = probs$answer.opt,
                      date = rep(val.date,nrow(probs)),
                      value = probs$V1))
  })
  return(do.call("rbind",records))
}

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## foreach History Function 
##
##to aggregate accross many days quickly
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
history <- function(start, end){
  #define a sequence of dates
  date.seq <- seq(as.Date(start),as.Date(end),by="day")

  #now run a foreach to get the history for each date
  hist <- foreach(day = date.seq,  .combine = "rbind") %dopar% {
    #setattr(DT,".internal.selfref",NULL) #resolves occasional internal selfref error
    each.day(val.date = day)
  }
}

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## Examples
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

##aggregate only one day
each.day(val.date = "2012-12-13")

##generate a history
hist.example <- history (start = "2012-11-01", end = Sys.Date())
Aggregate answered 11/3, 2013 at 15:19 Comment(8)
can you paste your foreach loop implementation here (even though it may not reproduce the problem as you say)?Rosariorosarium
Re the workaround attempt, good idea, but it's setattr not setattrib. For the proper solution Arun is spot on, it doesn't need to be reliably reproducible, but if you paste the code we can probably stress test it in the right way to make it fail.Sheets
And I see doMC was updated to 1.3.0 on 22 Feb, and data.table to 1.8.8 on 6 Mar.. Please ensure to provide version numbers of everything you're using up front e.g. sessionInfo().Sheets
And apologies about setattrib - that was my typo in an off list suggestion to you a few weeks back!Sheets
the loop is pretty involved as it is now, so i'll work on a more compressed simulated version to share with you -- it may be a day or so before i can get to itAggregate
Sounds good. So the workaround worked then? Or does it need a few days to see if it bites again before being sure? I don't know the doMC package so that cut down example is really needed (by me anyway) to progress a proper fix.Sheets
it takes about 24 hours to run through the whole process, so I will have an update tomorrow afternoon, along with an example for youAggregate
thanks again for your help guys. my code ran without issue using the setattr() workaround, which is great news. note: the example i just posted should be enough to replicate the error with enough repetition.Aggregate
S
4

Thanks for reporting and all the help in finding it! Now fixed in v1.8.11. From NEWS :

In long running computations where data.table is called many times repetitively, the following error could sometimes occur, #2647 :
Internal error: .internal.selfref prot is not itself an extptr
Fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples.

Possibly related is a memory leak in grouping, which is also now fixed.

Long outstanding (usually small) memory leak in grouping fixed, #2648. When the last group is smaller than the largest group, the difference in those sizes was not being released. Also in non-trivial aggregations where each group returns a different number of rows. Most users run a grouping query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T.
Memory leak in data.table grouped assignment by reference
Slow memory leak in data.table when returning named lists in j (trying to reshape a data.table)

Sheets answered 2/1, 2014 at 19:7 Comment(0)
P
2

A similar problem has been plaguing me for months. Perhaps we can see a pattern by putting our experiences together.

I've been waiting to post till I could create a a reproducible example. Not possible thus far. The bug doesn't happen in the same code location. In the past I've been able to avoid the error often by merely rerunning the exact same code. Other times I've reformulated an expression and rerun with success. In any case I'm pretty sure that these errors are truly internal to data.table.

I've saved the last 4 error messages in attempt to detect a pattern (pasted below).

---------------------------------------------------
[1] "err msg: location 1"
Error in selfrefok(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: my.fun1 ... $<- -> $<-.data.table -> [<-.data.table -> selfrefok
Execution halted


---------------------------------------------------
[1] "err msg: location 1"
Error in alloc.col(newx) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: my.fun1 -> $<- -> $<-.data.table -> copy -> alloc.col
Execution halted


---------------------------------------------------
[1] "err msg: location 2"
Error in shallow(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: print ... do.call -> lapply -> as.list -> as.list.data.table -> shallow
Execution halted

---------------------------------------------------
[1] "err msg: location 3"
Error in shallow(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: calc.book.summ ... .rbind.data.table -> as.list -> as.list.data.table -> shallow
Execution halted

Another similarity to the above example: I'm passing data.tables around among parallel threads, so they are being serialized/unserialized.

I will try the 'setattr' fix mentioned above.

hope this helps and thanks, jason

here is a simplification of one of the code segments that seems to generate this error 1 out of every 50-100k times it is run:

thanks @MatthewDowle btw. data.table has been most useful. Here is one stripped down bit of code:

require(data.table)
require(xts)

book <- data.frame(name='',
                   s=0,
                   Value=0.0,
                   x=0.0,
                   Qty=0)[0, ]

for (thing in list(1,2,3,4,5)) {

  tmp <- xts(1:5, order.by= make.index.unique(rep(Sys.time(), 5)))
  colnames(tmp) <- 'A'
  tmp <- cbind(coredata(tmp[nrow(tmp), 'A']),
               coredata(colSums(tmp[, 'A'])),
               coredata(tmp[nrow(tmp), 'A']))

  book <- rbind(book,
                data.table(name='ALPHA',
                           s=0*NA,
                           Value=tmp[1],
                           x=tmp[2],
                           Qty=tmp[3]))

}

something like this seems to be the cause of this error:

Error in shallow(x) : 
  Internal error: .internal.selfref prot is not itself an extptr
Calls: my.function ... .rbind.data.table -> as.list -> as.list.data.table -> shallow
Execution halted
Prostrate answered 24/3, 2013 at 16:35 Comment(13)
Very interesting. Don't get discouraged by your very first answer being deleted! I think the diamond mod deleted it on a technicality since you wrote it wasn't an answer, perhaps. Anyway I got it undeleted. Thanks for this info!Sheets
I can start to make some guesses with this info. But you don't have to provide a reproducible example. In cases like this, you can just provide as much code as you can. Then we can stress test it from there. I just need at least a skeleton of code that runs similar to yours. It doesn't need to crash reliably.Sheets
If you write back please ensure to start comments with @MatthewDowle otherwise I'm unlikely to see it.Sheets
I've filed a bug report so as not to forget : #2647 Intermittent internal selfref error when used with doMC::foreachSheets
@JasonB: I'm experiencing Error in shallow(x): Internal error: .internal.selfref prot is not itself an extptr Calls: print ... do.call -> lapply -> as.list -> as.list.data.table -> shallow but in a simple lapply (no parallel execution), if it can help to localize the source of the problem ...Revis
@MatthewDowle I'm also experiencing similar problems. I've got an error in for loop Error in shallow(x) : Internal error: .internal.selfref prot is not itself an extptr the problem is it's that the when i try to debug it, the error disappears and loop exits correctly. When it appeared it ruined the whole day of calculations.Sarracenia
@Revis In just lapply (no parallel)? Yes that is useful. Any chance of something reproducible I can run? It doesn't have to be something that reliably crashes (although that would be ideal), but something that you think is close is fine.Sheets
@Sarracenia Sorry to hear that. Are you using it with doMC::foreach or like leodido have the problem with something else? Anything reproducible (or close)? I've bumped up the priority of the bug (#2647).Sheets
@MatthewDowle i'm using plain for loops. Actually finding a similarity matrix from merging different dataset parts. I could provide the code but without knowing the data structure it would be problematic to read it.Sarracenia
@MatthewDowle As far as i can get the problem appears with huge datasets (though it's the purpose of data.table). In my example it halted on 380k'th iteration. R version 2.15.2, data.table 1.8.8Sarracenia
@Sarracenia If you can provide the code, then it is better than nothing. It might give me a clue at least.Sheets
@MattDowle I also experienced this while using foreach, will try to use the setattr(DT,".internal.selfref",NULL) workaround and let you knowDilate
Fixed it now. See new answer. Thanks all for your help!Sheets
P
1

For the sake of reproducing the error, I have a script for you guys to pour over and figure out where this bug is coming from. The error reads:

Error in { : 
task 96 failed - "Internal error: .internal.selfref prot is not itself an extptr"
Calls: apply ... system.time -> apply -> FUN -> %dopar% -> <Anonymous>
Execution halted

and I'm using doParallel to register my backend for foreach.

Context: I'm testing out classifiers on the MNIST hand-written digit dataset. You can get the data from me via

wget -nc https://www.dropbox.com/s/xr4i8gy11ed8bsh/digit_id_data_and_benchmarks.zip

just be sure to modify the script (above) so that it correctly points to load_data.R and load_data.R correctly points to the MNIST data -- though it may be easier for you to just clone my repo, hop on the random_gov branch, and then run dt_centric_random_gov.R.

Sorry I couldn't make a more minimal reproducible example, but like @JasonB's answer, this error doesn't seem to pop up until you do a ton of calculations.

edit: I re-ran my script using the suggested work-around above and it seemed to go off without a hitch.

Poulson answered 19/12, 2013 at 20:6 Comment(4)
Thanks. It's running. How long does it take before it fails with that error?Sheets
It took several hours, using 8 cores on a cluster. I'm not sure how long it would take to catch on a single core...Poulson
Fixed it now. See separate answer from me. Thanks for your help!Sheets
No, thank you! data.table never ceases to amaze me with how fast it is. I wish there was better literature out there to help me get a bit savvier with it.Poulson

© 2022 - 2024 — McMap. All rights reserved.