The problem is simple, consider the following example:
m <- head(iris)
write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv')
The result of this is that m1
is different from the original object m
in that it has a new first column named "X". If I really wanted to make them equal, I have to use additional arguments, like in these two examples:
write.csv(m, file = 'm.csv', row.names = FALSE)
# and then
m1 <- read.csv('m.csv')
or
write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv', row.names = 1)
The question is, what is the reason of this difference? in particular, why if write.csv
and read.csv
are supposedly intended to stick to the Excel convention, the don't import the same object that was exported in the first place? To me this is a very counter intuitive behavior and highly undesirable.
(this results happens exactly the same if I use the csv2 variants of these functions)
Thanks in advance!
These are the data.frames m
and m1
if you prefer not to use R to see the example:
> m
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> m1
X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
read.csv
andwrite.csv
are supposed to use some Excel convention? – Kazanwrite.csv
andread.csv
are a "fast" way to forget about the specifics and "just do what you need", this is very annoying. In my case I always forget about this detail. You can read about this Excel convention with?write.table
. – Outhaulread.csv
uses the most common format) so we don't have to remember which function uses what, or have to go through the doc each time we use them. It was a bad design in the first place. – Retraction?write.table
provides an example of writing a CSV to input into Excel (I assume this is the "convention" you mention), it specifically says you need the equivalent ofread.csv('m.csv', row.names=1)
to read it back into R. Even if lots of people find this counter-intuitive, it's not going to change now (these defaults are probably 10+ years old). Hence, why these defaults were chosen is a moot point, and your question doesn't really have an answer. – Kazansvn log src/library/utils/R/write.table.R
"r32344 | ripley | 2004-12-27 08:25:32 -0500 (Mon, 27 Dec 2004) | 4 lines; add write.csv[2]" (and in r34879, "allow write.csv(row.names=FALSE)") – Shelaread.csv
was written by the Lilliputians andwrite.table
was written by the Blefuscudians. ;) – Checkmate