I have very big matrix, I know that some of the colnames of them are duplicated. so I just want to find those duplicated colnames and remove on of the column from duplicate.
I tried duplicate()
, but it removes the duplicate entries.
Would someone help me to implment this in R ?
the point is that, duplicate colnames, might not have duplicate entires.
How to remove duplicated column names in R?
Asked Answered
Let's say temp
is your matrix
temp <- matrix(seq_len(15), 5, 3)
colnames(temp) <- c("A", "A", "B")
## A A B
## [1,] 1 6 11
## [2,] 2 7 12
## [3,] 3 8 13
## [4,] 4 9 14
## [5,] 5 10 15
You could do
temp <- temp[, !duplicated(colnames(temp))]
## A B
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15
Or, if you want to keep the last duplicated column, you can do
temp <- temp[, !duplicated(colnames(temp), fromLast = TRUE)]
## A B
## [1,] 6 11
## [2,] 7 12
## [3,] 8 13
## [4,] 9 14
## [5,] 10 15
Or assuming data.frames you could use subset
:
subset(iris, select=which(!duplicated(names(.))))
Note that dplyr::select
is not applicable here because it requires column-uniqueness in the input data already.
iris <- iris %>% subset(., select = which(!duplicated(names(.))))
a pipe-friendly version –
Selfhood No need for
which
here. Without dplyr
, a correct version is subset(iris, select = !duplicated(names(iris)))
–
Jacy temp = matrix(seq_len(15), 5, 3)
colnames(temp) = c("A", "A", "B")
temp = as.data.frame.matrix(temp)
temp = temp[!duplicated(colnames(temp))]
temp = as.matrix(temp)
Why convert it to a dataframe and then back to matrix? How is it different from my answer? That you don't need to write an extra comma? –
Belligerency
That is important because I couldn't get your solution to work because mine was a data.table data.frame. Once I converted it to a matrix, worked like a charm. The comma omission is incidental and does not affect anything. –
Abruzzi
To remove a specific duplicate column by name, you can do the following:
test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)) & names(test) == "Species")
test = test[,-idx]
To remove all duplicated columns, it is a bit simpler:
test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)))
test = test[,-idx]
or:
test = cbind(iris, iris) # example with multiple duplicate columns
test = test[,!duplicated(names(test))]
Store all your duplicates into one vector say duplicates, and Use -duplicates with single bracket subsetting to remove duplicate columns.
# Define vector of duplicate cols (don't change)
duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22,
24, 25, 28, 32, 34, 36, 38, 40,
44, 46, 48, 51, 54, 65, 158)
# Remove duplicates from food and assign it to food2
food2 <- food[,-duplicates]
Not great to hard-code the duplicated column numbers. It's better and more flexible to do
which(duplicated(colnames(food)))
instead. –
Hays © 2022 - 2024 — McMap. All rights reserved.
column names
but samevalues
. Duplicate just names are different. How would we approach that? – Aye