Issue when importing dataset: `Error in scan(...): line 1 did not have 145 elements`
Asked Answered
P

12

67

I'm trying to import my dataset in R using read.table():

Dataset.df <- read.table("C:\\dataset.txt", header=TRUE)

But I get the following error message:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
   line 1 did not have 145 elements

What does this mean and how can I fix it?

Padriac answered 10/8, 2013 at 10:30 Comment(0)
B
80

This error is pretty self-explanatory. There seem to be data missing in the first line of your data file (or second line, as the case may be since you're using header = TRUE).

Here's a mini example:

## Create a small dataset to play with
cat("V1 V2\nFirst 1 2\nSecond 2\nThird 3 8\n", file="test.txt")

R automatically detects that it should expect rownames plus two columns (3 elements), but it doesn't find 3 elements on line 2, so you get an error:

read.table("test.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 2 did not have 3 elements

Look at the data file and see if there is indeed a problem:

cat(readLines("test.txt"), sep = "\n")
# V1 V2
# First 1 2
# Second 2
# Third 3 8

Manual correction might be needed, or we can assume that the value first value in the "Second" row line should be in the first column, and other values should be NA. If this is the case, fill = TRUE is enough to solve your problem.

read.table("test.txt", header = TRUE, fill = TRUE)
#        V1 V2
# First   1  2
# Second  2 NA
# Third   3  8

R is also smart enough to figure it out how many elements it needs even if rownames are missing:

cat("V1 V2\n1\n2 5\n3 8\n", file="test2.txt")
cat(readLines("test2.txt"), sep = "\n")
# V1 V2
# 1
# 2 5
# 3 8
read.table("test2.txt", header = TRUE)
# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
#   line 1 did not have 2 elements
read.table("test2.txt", header = TRUE, fill = TRUE)
#   V1 V2
# 1  1 NA
# 2  2  5
# 3  3  8
Bubble answered 10/8, 2013 at 10:43 Comment(3)
I bet 99 times out of 100 for the people googling, it's not gonna be actual missing data. They'll have the wrong delimiter or special character issues as in the other answers to this question.Miser
I think fill = TRUE should be the defaultLabelle
Agreed. fill = TRUE was the solution for me.Mayer
H
32

When running into this error and reviewing my dataset which appeared to have no missing data, I discovered that a few of my entries had the special character "#" which derailed importing the data. Once I removed the "#" from the offending cells, the data imported without issue.

Halberd answered 14/6, 2014 at 15:41 Comment(2)
You could have also set read.table(..., comment.char = "") to turn off the interpretation of comments in the file.Sienese
Apostrophies can also cause it ( ' ). Fix this by setting the option quote = "\""Teniers
S
10

I encountered this issue while importing some of the files from the Add Health data into R (see: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/21600?archive=ICPSR&q=21600 ) For example, the following command to read the DS12 data file in tab separated .tsv format will generate the following error:

ds12 <- read.table("21600-0012-Data.tsv", sep="\t", comment.char="", 
quote = "\"", header=TRUE)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  : line 2390 did not have 1851 elements

It appears there is a slight formatting issue with some of the files that causes R to reject the file. At least part of the issue appears to be the occasional use of double quotes instead of an apostrophe that causes an uneven number of double quote characters in a line.

After fiddling, I've identified three possible solutions:

  1. Open the file in a text editor and search/replace all instances of a quote character " with nothing. In other words, delete all double quotes. For this tab-delimited data, this meant only that some verbatim excerpts of comments from subjects were no longer in quotes which was a non-issue for my data analysis.

  2. With data stored on ICPSR (see link above) or other archives another solution is to download the data in a new format. A good option in this case is to download the Stata version of the DS12 and then open it using the read.dta command as follows:

    library(foreign)
    ds12 <- read.dta("21600-0012-Data.dta")
    
  3. A related solution/hack is to open the .tsv file in Excel and re-save it as a tab separated text file. This seems to clean up whatever formatting issue makes R unhappy.

None of these are ideal in that they don't quite solve the problem in R with the original .tsv file but data wrangling often requires the use of multiple programs and formats.

Supersedure answered 30/4, 2015 at 14:42 Comment(0)
E
4

If you are using linux, and the data file is from windows. It probably because the character ^M Find it and delete. done!

Excision answered 1/6, 2017 at 20:28 Comment(0)
H
2

For others who can't find a solution and know the data isn't missing elements:

I have this issue when I use Excel 2013 to save files as .csv and then try to load them in R using read.table(). The workaround I have found is to paste the data straight from Excel into a .txt document, then open with:

read.table(file.choose(), sep="\t").

I hope this helps.

Hydrosphere answered 9/1, 2016 at 19:9 Comment(0)
P
1

One of my variables was categorical with one alternative being multi string ("no event"). When I used read.table, it assumed that the space after the first string meant the end of the data point and the second string was pushed to the next variable. I used sep= "\t" to solve the problem. I was using RStudio in a Mac OX environment. A previous solution was to transform .txt files to .csv in Excel, and afterwards open them with read.csv function.

Penelopa answered 25/11, 2015 at 14:27 Comment(1)
Voted down. Extensions are irrelevant to R. You can use read.csv() whether the file is called something.txt or something.xls or something.csv, as long as the file is ASCII. Also note read.csv() is the same as read.table(sep=',', header=TRUE). It is only a shorthand to the latter.Campaign
J
1

Hash # symbol creating this error, if you can remove the # from the start of the column name, it could fix the problem.

Basically, when the column name starts with # in between rows, read.table() will recognise as a starting point for that row.

Javed answered 8/11, 2018 at 12:45 Comment(0)
A
0

I encountered this error when I had a row.names="id" (per the tutorial) with a column named "id".

Amply answered 14/7, 2017 at 1:50 Comment(0)
A
0

Beside all the guidance mentioned above,you can also check all the data.

If there are blanks between words, you must replace them with "_".

However that how I solve my own problem.

Adherent answered 6/8, 2018 at 8:23 Comment(0)
M
0

This simple method solved the problem for me: Copy the content of your dataset, open an empty Excel sheet, choose "Paste Special" -> "Values", and save. Import the new file instead.

(I tried all the existing solutions, and none worked for me. My old dataset appeared to have no missing values, space, special characters, or embedded formulas.)

Marque answered 8/10, 2019 at 17:27 Comment(0)
K
0

I have faced same issue while trying to read data from file in R. After figuring out I found out that sep value is causing this issue. Once I tried this with correct separator it was working as expected.

read.table("file_location/file_name",
    sep="." # exact separator as given in file: also "," or "\t" etc.
    col_names=c("name_1", "name_2",..))
Kuhn answered 2/9, 2020 at 19:28 Comment(0)
D
0

Passing the parameter comment.char = "" in read.table() helped in some cases.

When it didn't I resorted to using :
read_delim(file = "filename.tsv", delim = '\t')
which did the trick for me.

Here's the documentation for the function: http://rfunction.com/archives/1441 `

Dredi answered 14/11, 2021 at 14:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.