R read.csv "More columns than column names" error

Asked 10/9, 2014 at 16:59 Answered 23/12, 2022 at 21:30

I have a problem when importing .csv file into R. With my code:

t <- read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
          header=T, stringsAsFactors=FALSE,check.names=F)

R reports an error and does not do what I want:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

I guess the problem is because my data is not well formatted. I only need data from [,1:32]. All others should be deleted.

Data can be downloaded from: https://drive.google.com/file/d/0B86_a8ltyoL3VXJYM3NVdmNPMUU/edit?usp=sharing

Thanks so much！

Vino answered 10/9, 2014 at 16:59 Comment(2)

You cannot change the structure of an HTML file to be a CSV-formated file by just changing its file extension. – Lustrous 10/9, 2014 at 17:23

It's a CSV file. A wonky one, with 10 headers interspersed throughout, but it's a CSV file. – Dreadfully 10/9, 2014 at 17:30

That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.

Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines and other R functions to pre-process the text:

# use readLines to get the data
dat <- readLines("N0_07312014.CSV")

# i had to do this to fix grep errors
Sys.setlocale('LC_ALL','C')

# filter out the repeating, and wonky headers
dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)

# turn that vector into a text connection for read.csv
dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
                  header=FALSE, stringsAsFactors=FALSE)

str(dat_3)
## 'data.frame':    308 obs. of  37 variables:
##  $ V1 : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ V2 : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ V3 : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ V4 : chr  "" "" "" "" ...
## .. more
##  $ V36: chr  "" "" "" "" ...
##  $ V37: chr  "0" "0" "0" "0" ...

# grab the headers
headers <- strsplit(dat[1], ",")[[1]]

# how many of them are there?
length(headers)
## [1] 32

# limit it to the 32 columns you want (Which matches)
dat_4 <- dat_3[,1:32]

# and add the headers
colnames(dat_4) <- headers

str(dat_4)
## 'data.frame':    308 obs. of  32 variables:
##  $ Node Name         : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ RTC_date          : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ RTC_time          : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ N1 Bat (VDC)      : chr  "" "" "" "" ...
##  $ N1 Shinyei (ug/m3): chr  "" "" "0.23" "null" ...
##  $ N1 CC (ppb)       : chr  "" "" "null" "null" ...
##  $ N1 Aeroq (ppm)    : chr  "" "" "null" "null" ...
## ... continues

Dreadfully answered 10/9, 2014 at 17:20 Comment(1)

readlines + grep to filter is a good idea. I'll give you a +1. – Scevour 10/9, 2014 at 18:2

Open the .csv as a text file (for example, use TextEdit on a Mac) and check to see if columns are being separated with commas.

csv is "comma separated vectors". For some reason when Excel saves my csv's it uses semicolons instead.

When opening your csv use:

read.csv("file_name.csv",sep=";")

Semi colon is just an example but as someone else previously suggested don't assume that because your csv looks good in Excel that it's so.

Excel answered 21/8, 2016 at 18:27 Comment(1)

You just saved me! – Nicholle 28/1, 2022 at 12:19

That's one wonky CSV file. Multiple headers tossed about (try pasting it to CSV Fingerprint) to see what I mean.

Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines and other R functions to pre-process the text:

# use readLines to get the data
dat <- readLines("N0_07312014.CSV")

# i had to do this to fix grep errors
Sys.setlocale('LC_ALL','C')

# filter out the repeating, and wonky headers
dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)

# turn that vector into a text connection for read.csv
dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
                  header=FALSE, stringsAsFactors=FALSE)

str(dat_3)
## 'data.frame':    308 obs. of  37 variables:
##  $ V1 : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ V2 : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ V3 : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ V4 : chr  "" "" "" "" ...
## .. more
##  $ V36: chr  "" "" "" "" ...
##  $ V37: chr  "0" "0" "0" "0" ...

# grab the headers
headers <- strsplit(dat[1], ",")[[1]]

# how many of them are there?
length(headers)
## [1] 32

# limit it to the 32 columns you want (Which matches)
dat_4 <- dat_3[,1:32]

# and add the headers
colnames(dat_4) <- headers

str(dat_4)
## 'data.frame':    308 obs. of  32 variables:
##  $ Node Name         : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ RTC_date          : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ RTC_time          : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ N1 Bat (VDC)      : chr  "" "" "" "" ...
##  $ N1 Shinyei (ug/m3): chr  "" "" "0.23" "null" ...
##  $ N1 CC (ppb)       : chr  "" "" "null" "null" ...
##  $ N1 Aeroq (ppm)    : chr  "" "" "null" "null" ...
## ... continues

Dreadfully answered 10/9, 2014 at 17:20 Comment(1)

readlines + grep to filter is a good idea. I'll give you a +1. – Scevour 10/9, 2014 at 18:2

If you only need the first 32 columns, and you know how many columns there are, you can set the other columns classes to NULL.

read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
      header=T, stringsAsFactors=FALSE,
      colClasses=c(rep("character",32),rep("NULL",10)))

If you do not want to code up each colClass and you like the guesses read.csv then just save that csv and open it again.

Alternatively, you can skip the header and name the columns yourself and remove the misbehaved rows.

A<-data.frame(read.csv("N0_07312014.CSV",
                        header=F,stringsAsFactors=FALSE,
                        colClasses=c(rep("character",32),rep("NULL",5)),
                        na.string=c("","null","NaN","X")))
Yournames<-as.character(A[1,])
names(A)<-Yournames
yourdata<-unique(A)[-1,]

The code above assumes you do not want any duplicate rows. You can alternatively remove rows that have the first entry equal to the first column name, but I'll leave that to you.

Scevour answered 10/9, 2014 at 17:18 Comment(6)

Should be colClasses=, and the CSV file is really wonky. That particular method does not produce a viable data frame for that data. – Dreadfully 10/9, 2014 at 17:21

Can you skip the header row? – Scevour 10/9, 2014 at 17:24

The problem is there is not just a single header row, there are 10 interspersed throughout the whole file – Dreadfully 10/9, 2014 at 17:25

Then remove them. Presumably you do not want any duplicates. You can use unique to remove duplicates. – Scevour 10/9, 2014 at 17:45

It's not my data and that's what my code does. With your last revision, the poster can use the data once they remove the all NA columns. – Dreadfully 10/9, 2014 at 17:48

Remember, the OP wanted to get past the error. header=F does that. – Scevour 10/9, 2014 at 18:3

try read.table() instead of read.csv()

Carmacarmack answered 14/2, 2017 at 13:59 Comment(2)

This is way better then other but how to tackle with null values in table, then? – Displayed 5/7, 2018 at 16:23

i am not sure whether you mean this : df [is.na(df)] <- 0 ? – Carmacarmack 2/9, 2018 at 3:2

For me, the solution was using csv2 instead of csv.

Mousebird answered 19/10, 2021 at 12:12 Comment(0)

I was also facing the same issue. Now solved.

Just use header = FALSE

read.csv("data.csv", header = FALSE) -> mydata

Segregationist answered 14/11, 2020 at 10:13 Comment(1)

That is only a solution as long as there is no header. If the first line of the csv contains column names, than those are added to the variables and not only have to be filtered out, but also makes it neccessary to correct the data type of the variables. – Whelan 30/9, 2022 at 5:50

I had the same problem. I opened my data in textfile and double expressions are separated by semicolons, you should replace them with a period

Orms answered 9/6, 2021 at 11:14 Comment(0)

I was having this error that was caused by multiple rows of meta data at the top of the file. I was able to use read.csv by doing skip= and skipping those rows.

data <- read.csv('/blah.csv',skip=3)

Bicentenary answered 1/10, 2021 at 13:44 Comment(0)

-2

read.csv("file_name.csv", header=F)

Setting the HEADER to be FALSE will do the job perfectly for you...

Borst answered 23/12, 2022 at 21:30 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags