Pasting elements of two vectors alphabetically
Asked Answered
S

5

6

Say I have two vectors:

a <- c("george", "harry", "harry", "chris", "steve", "steve", "steve", "harry")
b <- c("harry", "steve", "chris", "harry", "harry", "george", "chris", "george")

What I want to do is paste together the 1st pair, 2nd pair, etc..... However, I want to paste the two elements of each pair in alphabetical order. In the above example, the first 2 pairs are already in alphabetical order, but the 3rd pair 'harry' and 'chris' are not. I want to return "chris harry" for this pair.

I have worked out how to do this in a 2 step process, but was wondering if there was a quick way (one line way) to do this just using paste?

My solution:

x <- apply(mapply(c, a, b, USE.NAMES = FALSE), 2, sort)
paste(x[1,], x[2,])

which gives the pairs in alphabetical order... but is there a 1 line way?

[1] "george harry" "harry steve"  "chris harry"  "chris harry"  "harry steve"  "george steve" "chris steve"  "george harry"
Squally answered 31/8, 2014 at 2:19 Comment(0)
G
5

Here's one approach:

apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" "))

## [1] "george harry" "harry steve"  "chris harry"  "chris harry"  
## [5] "harry steve" "george steve" "chris steve"  "george harry"

Using your initial attempt, you could also do the following but they both require more typing (not sure about speed):

unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)
Gamecock answered 31/8, 2014 at 2:23 Comment(1)
that's great thanks. I couldn't figure out how to incorporate the sort into a one linerSqually
L
7

slightly redundant because it sorts twice, but vectorised,

paste(pmin(a,b), pmax(a,b))

Edit: alternative with ifelse,

ifelse(a < b, paste(a, b), paste(b, a))
Lafreniere answered 31/8, 2014 at 18:54 Comment(2)
there might be subtle issues depending on locale settings etc.Lafreniere
I did a speed comparison - this method is extremely quick!Squally
G
5

Here's one approach:

apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" "))

## [1] "george harry" "harry steve"  "chris harry"  "chris harry"  
## [5] "harry steve" "george steve" "chris steve"  "george harry"

Using your initial attempt, you could also do the following but they both require more typing (not sure about speed):

unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)
Gamecock answered 31/8, 2014 at 2:23 Comment(1)
that's great thanks. I couldn't figure out how to incorporate the sort into a one linerSqually
E
1

Here's a similar method to Tyler's, but with Map. Technically it's a one-liner...

unlist(Map(function(x,y) {
    paste(sort(c(x,y)), collapse = " ")
    }, a, b, USE.NAMES = FALSE))
# [1] "george harry" "harry steve"  "chris harry"  "chris harry" 
# [5] "harry steve"  "george steve" "chris steve"  "george harry"
Eyespot answered 31/8, 2014 at 2:34 Comment(0)
L
1

One liner from your own code:

apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))
[1] "george harry" "harry steve"  "harry chris"  "chris harry"  "steve harry"  "steve george" "steve chris"  "harry george"


apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste)

     [,1]     [,2]   
[1,] "george" "harry"
[2,] "harry"  "steve"
[3,] "chris"  "harry"
[4,] "chris"  "harry"
[5,] "harry"  "steve"
[6,] "george" "steve"
[7,] "chris"  "steve"
[8,] "george" "harry"
Liu answered 31/8, 2014 at 16:45 Comment(2)
nice. do you know whether this method would be faster/slower than Tyler's method? I ask, because the vectors I'm pairing contain >200,000 elements and speed mattersSqually
You can use system.time() to test which one is the fastest. It will be interesting to know the results since many solutions have been posted.Liu
S
1

Here is a speed comparison of the above answers...

I took the data from my own dataset of all English soccer games played in the football league top 4 divisions, available here: https://github.com/jalapic/engsoccerdata

The dataset is 'engsoccerdata' and I used the 3rd and 4th columns (home and visitor team) to paste together. I converted each column to a character vector. Each vector has 188,060 elements - there have been 188,060 soccer games in the top 4 tiers of English soccer from 1888-2014.

Here is the comparison:

df<-engsoccerdata

a<-as.character(df[,3])
b<-as.character(df[,4])

#tyler1
system.time(apply(cbind(a, b), 1, function(x) paste(sort(x), collapse=" ")))

#tyler2
unlist(Map(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b),,FALSE)

#tyler3
mapply(function(x, y) paste(sort(c(x, y)), collapse=" "), a, b, USE.NAMES = FALSE)

#baptiste1
paste(pmin(a,b), pmax(a,b))

#baptiste2
ifelse(a < b, paste(a, b), paste(b, a))  

#RichardS
unlist(Map(function(x,y) {
  paste(sort(c(x,y)), collapse = " ")
}, a, b, USE.NAMES = FALSE))


#rnso1
apply(data.frame(apply(mapply(c, a, b, USE.NAMES = FALSE),1,paste)),1,function(x) paste(x[1],x[2]))

#rnso2
apply(apply(mapply(c, a, b, USE.NAMES = FALSE),2,sort),1,paste) 

system.time() results:

#              user  system elapsed 
#tyler1       42.92    0.02   43.73 
#tyler2       14.68    0.03   15.04
#tyler3       14.78    0.00   14.88 
#baptiste1     0.79    0.00    0.84 
#baptiste2     1.25    0.00    1.28 
#RichardS     15.40    0.01   15.64
#rnso1         6.22    0.10    6.41
#rnso2        13.07    0.00   13.15 

Very interesting. baptiste's methods were lightning quick !

Squally answered 2/9, 2014 at 1:16 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.