How to remove duplicated column names in R?

C

5

33

I have very big matrix, I know that some of the colnames of them are duplicated. so I just want to find those duplicated colnames and remove on of the column from duplicate. I tried duplicate(), but it removes the duplicate entries. Would someone help me to implment this in R ? the point is that, duplicate colnames, might not have duplicate entires.

Cheekpiece answered 10/6, 2014 at 13:57 Comment(0)

B

58

Let's say temp is your matrix

temp <- matrix(seq_len(15), 5, 3)
colnames(temp) <- c("A", "A", "B")

##      A  A  B
## [1,] 1  6 11
## [2,] 2  7 12
## [3,] 3  8 13
## [4,] 4  9 14
## [5,] 5 10 15

You could do

temp <- temp[, !duplicated(colnames(temp))]

##      A  B
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15

Or, if you want to keep the last duplicated column, you can do

temp <- temp[, !duplicated(colnames(temp), fromLast = TRUE)] 

##       A  B
## [1,]  6 11
## [2,]  7 12
## [3,]  8 13
## [4,]  9 14
## [5,] 10 15

Belligerency answered 10/6, 2014 at 14:2 Comment(1)

Hi @david-arenburg. Thanks for such a useful solution. What if a dataframe has two columns with different column names but same values. Duplicate just names are different. How would we approach that? – Aye 20/12, 2022 at 3:40

R

18

Or assuming data.frames you could use subset:

subset(iris, select=which(!duplicated(names(.))))

Note that dplyr::select is not applicable here because it requires column-uniqueness in the input data already.

Robi answered 4/1, 2017 at 9:31 Comment(2)

iris <- iris %>% subset(., select = which(!duplicated(names(.)))) a pipe-friendly version – Selfhood 23/4, 2020 at 22:20

No need for which here. Without dplyr, a correct version is subset(iris, select = !duplicated(names(iris))) – Jacy 28/7, 2023 at 12:55

O

3

temp = matrix(seq_len(15), 5, 3)
colnames(temp) = c("A", "A", "B")

temp = as.data.frame.matrix(temp)
temp = temp[!duplicated(colnames(temp))]
temp = as.matrix(temp)

Orfurd answered 9/9, 2020 at 9:17 Comment(2)

Why convert it to a dataframe and then back to matrix? How is it different from my answer? That you don't need to write an extra comma? – Belligerency 23/9, 2020 at 6:26

That is important because I couldn't get your solution to work because mine was a data.table data.frame. Once I converted it to a matrix, worked like a charm. The comma omission is incidental and does not affect anything. – Abruzzi 5/1, 2021 at 12:2

T

1

To remove a specific duplicate column by name, you can do the following:

test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)) & names(test) == "Species")
test = test[,-idx]

To remove all duplicated columns, it is a bit simpler:

test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)))
test = test[,-idx]

or:

test = cbind(iris, iris) # example with multiple duplicate columns
test = test[,!duplicated(names(test))]

Treasure answered 5/2, 2019 at 2:53 Comment(0)

E

0

Store all your duplicates into one vector say duplicates, and Use -duplicates with single bracket subsetting to remove duplicate columns.

       # Define vector of duplicate cols (don't change)
       duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22, 
            24, 25, 28, 32, 34, 36, 38, 40, 
            44, 46, 48, 51, 54, 65, 158)

      # Remove duplicates from food and assign it to food2
         food2 <- food[,-duplicates]

Equipollent answered 27/1, 2018 at 8:39 Comment(1)

Not great to hard-code the duplicated column numbers. It's better and more flexible to do which(duplicated(colnames(food))) instead. – Hays 20/6, 2019 at 14:21

Recommended topics

Hot tags