"Incorrect number of dimensions" error, help me understand why
Asked Answered
M

2

14

Organization of this question:

I.   Background
II.  The Problem/Question
III. Steps Taken to Make this Question Good
IV.  Update: the output of head(x.path) and dput(x.path)

I. Background

I am customizing/adapting the e-mail classification code from the O'Reilly book "Machine Learning for Hackers" (Chapter 3). That code and its accompanying data can be found here: https://github.com/johnmyleswhite/ML_for_Hackers/tree/master/03-Classification

II. The Problem/Question

One of the main functions in that code is called get.msg(). The original function is

get.msg <- function(path)
{
  con <- file(path, open = "rt", encoding = "latin1")
  text <- readLines(con)
  # The message always begins after the first full line break
  msg <- text[seq(which(text == "")[1] + 1, length(text), 1)]
  close(con)
  return(paste(msg, collapse = "\n"))
}

My data is different in a number of ways though, so I have to edit this quite a bit. My data is read in earlier from a relational DB, thus I don't have to read in and clean a text file. Instead, my email body data is the 18th column of a dataframe, which we can call x. Here is my version of get.msg():

get.msg <- function(path) {
  bodyvector <- path[!(is.na(path[,18]) | path[,18]==""), ]
  return(paste(bodyvector))
}

Originally I referred to it as x$email and this worked through most of the code, however in a later step the get.msg() function was used on x.path, where x.path pointed to x and was used within another function in combination with the paste() function, as per the authors of the example code:

 z.spam <- sapply(spam.docs, function(p) count.word(paste(x.path,p,sep = ""),         "keyword"))

Here, the count.word() function is a function containing get.msg(). So, the paste() function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18]).

Then I did some checking to ensure that x.path[,18] had the same information as x.path$email, which it did. However, when I try to run the code I get an error message on get.msg(x.path), which is:

Error in path[,18] : incorrect number of dimensions.

I tried path[,'email'], then path[18,] and then just path by itself and all three led to the same error. I tried path[[1]][[18]] and that gave me a subscript out of bounds error.

Any thoughts?

III. Steps Taken to Make this Question Good

To avoid annoying anyone and getting any down votes, I confirmed that the topic was relevant to StackOverflow and I feel that it may be relevant to other people dealing with this or similar programming problems in the future. I also spent almost an hour researching this problem online and trying things in R to fix it.

There were plenty of references to this error message, however the causes seemed to be very diverse and completely unrelated (such as networking trouble, etc). Finally, I spent a significant amount of time editing this question to try to make it readable and properly formatted (I hope I did okay, I know it's a lot of information).

IV. The output of head() and dput()

Some of you extremely helpful folks have requested to see the output of head(x.path) or dput(x.path). I don't mind except that it's confidential company email data and I'll be out of a job and sued if I publish it. ;-)

I've pasted it here and replaced the real info with fake info. I hope this is okay. I tried to use dput() at first and I can do so if you like but it was truly an overwhelming amount of data. Here's head(x.path):

> head(x.path)
[1] "c(\"Z12e3317e4b1jZbbajZ9Zdd6\", \"Z12e3317e4b1jZbbajZ99124\", \"Z12e331Ze4b1jZbbajZ996dd\", \"Z12e3319e4b1jZbbajZ9acb6\", \"Z12e3319e4b1jZbbajZ9ad3b\", \"Z12e3319e4b1jZbbajZ9adjd\", \"Z12e3319e4b1jZbbajZ9aebZ\", \"Z12e3319e4b1jZbbajZ9aj23\", \"Z12e3319e4b1jZbbajZ9b22b\", \"Z12e3319e4b1jZbbajZ9b42a\", \"Z12e3319e4b1jZbbajZ9b49a\", \"Z12e331ae4b1jZbbajZ9bZ11\", \"Z12e331ae4b1jZbbajZ9bZZ4\", \"Z12e331ae4b1jZbbajZ9c237\", \"Z12e331ae4b1jZbbajZ9c2e4\", \"Z12e331ae4b1jZbbajZ9c3bZ\", \"Z12e331ae4b1jZbbajZ9c3cZ\", \"Z12e331ae4b1jZbbajZ9cZ31\", \n\"Z12e331be4b1jZbbajZ9cddd\", \"Z12e331be4b1jZbbajZ9cja6\", \"Z12e331ce4b1jZbbajZ9da1j\", \"Z12e331de4b1jZbbajZ9e649\", \"Z12e331de4b1jZbbajZ9j669\", \"Z12e331de4b1jZbbajZ9jZZZ\", \"Z12e331ee4b1jZbbajZ9j944\", \"Z12e331ee4b1jZbbajZ9jcZa\", \"Z12e331ee4b1jZbbajZ9jd4c\", \"Z12e331ee4b1jZbbajZa11e2\", \"Z12e331ee4b1jZbbajZa1291\", \"Z12e331ee4b1jZbbajZa1344\", \"Z12e3311e4b1jZbbajZa1j73\", \"Z12e3311e4b1jZbbajZa1131\", \"Z12e3311e4b1jZbbajZa11Z6\", \"Z12e3311e4b1jZbbajZa124c\", \"Z12e3311e4b1jZbbajZa1Zbc\", \"Z12e3311e4b1jZbbajZa19a9\", \n\"Z12e3311e4b1jZbbajZa1ac2\", \"Z12e3311e4b1jZbbajZa1b79\", \"Z12e3311e4b1jZbbajZa1db2\", \"Z12e3311e4b1jZbbajZa1ejb\", \"Z12e3312e4b1jZbbajZa2333\", \"Z12e3312e4b1jZbbajZa23aZ\", \"Z12e3312e4b1jZbbajZa24bb\", \"Z12e3312e4b1jZbbajZa2Z79\", \"Z12e3312e4b1jZbbajZa2Zea\", \"Z12e3312e4b1jZbbajZa2ba9\", \"Z12e3312e4b1jZbbajZa2cZa\", \"Z12e3313e4b1jZbbajZa3bc1\", \"Z12e3313e4b1jZbbajZa3ca9\", \"Z12e3313e4b1jZbbajZa3e71\", \"Z12e3ajbe4b1j66Zbcja4eZc\", \"Z12e3ajbe4b1j66Zbcja4ja4\", \"Z12e3c79e4b1j66ZbcjaZc36\", \"Z12e3e1ce4b1j66Zbcja64bd\", \n\"Z12e4117e4b1j66Zbcja6Zj1\", \"Z12e41bae4b1j66Zbcja734Z\", \"Z12e4226e4b1j66Zbcja7b13\", \"Z12e4226e4b1j66Zbcja7cbZ\", \"Z12e4ajee4b1j66Zbcjaa916\", \"Z12e4e61e4b1j66Zbcjab1c2\", \"Z12e4e61e4b1j66Zbcjab2da\", \"Z12eZ226e4b1j66ZbcjacZea\", \"Z12e6141e4b1j66Zbcjb19Z9\", \"Z12e6141e4b1j66Zbcjb19jd\", \"Z12e61Z9e4b1j66Zbcjb1acb\", \"Z12e61Z9e4b1j66Zbcjb1acj\", \"Z12j9713e4b1j66Zbcjc34db\", \"Z12j9713e4b1j66Zbcjc3ZZa\", \"Z12j9713e4b1j66Zbcjc3Za7\", \"Z12j9713e4b1j66Zbcjc3Zd2\", \"Z12j9713e4b1j66Zbcjc36c2\", \"Z12j973ce4b1j66Zbcjc396b\"\n)"
[2] "c(\"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \n\"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\", \"Something\")"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
[3] "c(61Z7, 674Z, Z462, 692, Z26, 1121, 1213, 1317, 21ZZ, 2Z9Z, 2711, 3612, 3717, 4774, 4Z93, Z117, Z113, Z197, Z77Z, 61Z3, Z16Z, 11771, 12923, 13374, 13Z93, 14277, 1446Z, 1Z3ZZ, 1ZZ16, 1Z993, 164Z2, 16664, 1711Z, 171Z6, 1Z6ZZ, 1Z921, 19211, 193ZZ, 19931, 21117, 21164, 21177, 21371, 21Z61, 21673, 22ZZ7, 23137, 2ZZ44, 26166, 26Z1Z, 173Z6, 17661, 21Z74, 23119, 232ZZ, 249Z3, 2ZZ31, 261Z9, 31211, 33414, 336Z6, 37941, 1743, 1Z61, 216Z, 2171, 1ZZ3, 2119, 21Z4, 2129, 2334, 2ZZZ)"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
[4] "c(\"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \n\"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\", \"Booty\")"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
[5] "c(Z6, 93Z, 1314, 3, 4, Z, 6, 7, 9, 11, 11, 13, 14, 2Z, 26, 27, 2Z, 29, 33, 34, ZZ, Z3, 122, 12Z, 133, 139, 142, 147, 1Z2, 1Z3, 16Z, 169, 171, 171, 219, 221, 221, 222, 22Z, 226, 244, 246, 247, 24Z, 249, 2637, 264, 2Z9, 292, 296, 49, Z1, 76, 93, 9Z, 112, 111, 114, 1Z7, 211, 214, 263, 6, 7, 11, 11, 11, 11, 12, 13, 14, 1Z)"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
[6] "c(3Z11, 3Z11, 3Z11, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, 691Z, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, Z664, 66Z1, 66Z1, 66Z1, 66Z1, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4, 4ZZ4)"       

If this were to show you more then you'd see message bodies for [18].

Mineralogist answered 29/3, 2013 at 23:52 Comment(8)
It will be much easier if you show us your object (head, str) and the offending line of code. A reproducible example may go even farther.Dipetalous
Thought is path needs to be a two-dimension object (e.g. data.frame or matrix) so you can do path[,18]; your x.path is not. Just do class(x.path) and you should see that.Fantast
Update: I also tried replacing [,18] with [,'email'] but got the same message. I see two comments popped up while I'm editing this so let me save my commend then I will follow up on yours (and thanks btw!). I would give you the output of head() but it's confidential email bodies : /Mineralogist
flodel: You're right, class(x.path) shows that it's character due to the paste() command, but I used that because of the authors' example and because I can't figure out how to get away from it while still using the anonymous function like in that third snippet of code in my original post. Is there a way I could do that without paste tho? Sorry for the dumb question.Mineralogist
Roman: I can however describe the output of head(x.path). It's a large dataframe with different kinds of data stored relating to an email client. The only column I care about at the moment is the email body column and that one is just the text of emails, with \r and \n and other such text representations of formatting.Mineralogist
OK, so I added the output of head() but replaced some of the text. Since the email data was in the 18th column I guess you can't see it anyway. Someone else requested this as well, but unfortunately it looks like they deleted their answer/commend while I was editing the post :/Mineralogist
Ricardo if you see this then I encourage you to restore your answer. It may not have immediately solved the problem but I think it was very useful and you had some insights about dim()... what happened, buddy? You even removed your up vote?Mineralogist
So getmsg started out with the argument path as a string that referred to the file that a single message was stored in, yes? But now the argument path is meant to be what? A single row of your email data frame? The whole data frame? Is it one data frame for each email? I'm afraid you lost me somewhere along the way.Houseman
P
6

Your example is a little complex for me to run, but I have gotten this error a number of times and the problem has always been due ultimately to the default behavior of the extract function (i.e. []) in coercing to the lowest possible number of dimensions. As BondedDust observes, if you extract a single column from a data frame you can no longer select subsets of the frame with the same syntax, because you do not have a data frame any more.

Frequently these problems vanish if, in any operation in which you may be reducing the data frame to a single column, you set the parameter drop=FALSE in the extract operation. I suggest that you look carefully not only at the line where the error is generated but also at any preceding lines in which the "[]" operation is used on the problem data frame. Look at the help for the data frame method for the extract function, "extract.data.frame" believe the problem is probably that when you subset the data frame to a single column, it is coerced to a single dimension and can no longer be indexed by column number or row number.

Powerful answered 16/4, 2014 at 6:52 Comment(0)
I
2

This might deserve to be a comment but it wouldn't fit and I'm prepared to delete if warranted. You say

"So, the paste function was causing problems because it caused x.path to be considered an atomic array apparently, and gave the error that $ could not be used with an atomic array. As per an older StackOverflow Q&A, I changed the way I referred to the column to path[,18] (which is evaluated as x.path[,18] and therefore is the same as x[,18])."

If x.path is an atomic array then you cannot use x.path[ , 18] but rather need to use x.path[18].

You can inspect x.path with str(x.path) and your output suggests that is indeed a character vector. In R only objects with two dimensions (matrices and data.frames) can be referenced with object[ , n] references.

Irrespirable answered 30/3, 2013 at 1:55 Comment(7)
I think you might be onto something but I got this error: Error in path[!(is.na(path[18]) | path[18] == ""), ] : incorrect number of dimensionsMineralogist
By the way, it is character (thanks to paste) and str(x.path) gives me this: str(x.path) chr [1:145] "c(\"5... and then it goes on a long wayMineralogist
dim(x.path) says NULL... so it has no dimensions, but I'm not even really sure what that means. If there are no dimensions shouldn't path by itself have worked? But that gave the same error...Mineralogist
Vectors in R have no dimensions. It just an ordinary character vector. I do not know what you mean by "shouldn't path work".Irrespirable
Thanks. It sounds like we are on the same page; that's what I thought it meant. When I said "shouldn't path work" I meant, since there is only one column/vector should just using "path" by itself instead of path[,18] in get.msg make it work? But it gives the same error. I also just tried changing the paste statement to specify the column of interest at that point but it didn't work.Mineralogist
I just got it to work but the solution is almost embarrassing. I would up vote your answer if I had enough rep, but what do I do with my solution? Answer my own question? Or post it as a comment here? Anyway thanks a ton for your help. If it wasn't for talking to you I probably wouldn't have figure it out.Mineralogist
You can post an answer to your own question. I'm not worried about upvotes. All I did was to read through you problem description until I found an apparent misconception about R syntax/semantics.Irrespirable

© 2022 - 2024 — McMap. All rights reserved.