R: Error in pi[[j]] : subscript out of bounds -- rbind on a list of dataframes
Asked Answered
U

1

6

I am trying to rbind a large list of data frames (outputDfList), which is generated by lapply a complicated function to a large table. You can recreate outputDfList by:

df1=data.frame("randomseq_chr15q22.1_translocationOrInsertion", "chr15", "63126742")
names(df1)=NULL
df2=df1=data.frame("chr18q12.1_chr18q21.33_large_insertion", "chr18 ", "63126741")
names(df2)=NULL
outputDfList=list(df1,df2)

my code is

do.call(rbind, outputDfList)

The error message I received:

Error in pi[[j]] : subscript out of bounds

I double checked the column numbers of each dataframes and they are all the same. I also tried to use "options(error=recover)" for debug, but I'm not familiar with it enough to pitch down the exact issue. Any help is appreciated. Thank you.

Urbanity answered 16/1, 2017 at 17:21 Comment(6)
I’m unable to reproduce the error message. You’ll need to construct a minimal example to reproduce the problem, and post the exact code/data to reproduce it here. reprex may be helpful for that.Joiner
@KonradRudolph Thanks a lot for the comment. You are right. I added back the long names of my dataframes and I think now it should show the error.Urbanity
Unfortunately this isn’t sufficient since we still don’t know exactly what your data looks like (if I try reconstructing your data from what you’ve posted, the command works). Could you please dput the relevant data?Joiner
@KonradRudolph Thank you for being so patient. I could not dput the original data because the outputDfList is generated by lapply a complicated function to a table. However, I was able to reproduce the error using the code above. Would you please try the code and let me know if you could see the error please? Thanks a lot.Urbanity
Why are you setting the column names to NULL? rbind is trying to match up columns by name - difficult if there aren't anyJohnsonjohnsonese
@RichardTelford You are right. I didn't realize that. I set it to NULL to mimic my original code. The dataframes were generated with different colnames by default, so I had to reset them. Now it is fixed thank you.Urbanity
J
8

After the update it seems that your problem is that you have invalid column names: Data frame column names must be non-null.

After correcting this, the code then works:

for (i in seq_along(outputDfList)) {
    colnames(outputDfList[[i]]) = paste0('V', seq_len(ncol(outputDfList[[i]])))
}

do.call(rbind, outputDfList)
#                                       V1     V2       V3
# 1 chr18q12.1_chr18q21.33_large_insertion chr18  63126741
# 2 chr18q12.1_chr18q21.33_large_insertion chr18  63126741

However, I’m puzzled how this situation occurred in the first place. Furthermore, the error message I’m getting with your code is still distinct from yours:

Error in match.names(clabs, names(xi)) :
names do not match previous names

Joiner answered 16/1, 2017 at 19:34 Comment(2)
Thanks for the reply. I am puzzled by it as well... but you are absolutely right about I need column names for my data frames. I added this to the function which generated the list of dataframes, and it worked. Thank you!Urbanity
I've seen both errors now. I was trying to call do.call(rbind, myList) on a list of data frames when I got the match.names error. The data frames all had different column names so I used lapply(myList, unname) thinking this would fix the problem but then when I tried do.call() again, I got the subscript out of bounds error described above. As described in the comments above, this has the effect of setting the column names to NULL so rbind() fails.Piton

© 2022 - 2024 — McMap. All rights reserved.