How to find Consecutive Numbers Among multiple Arrays?

Asked 24/6, 2016 at 14:36 Answered 27/6, 2016 at 13:51

Solved arrays r algorithm permutation number-theory

I right away give an example, now suppose I have 3 arrays a,b,c such as

a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

I must be able to extract consecutive triplets among them i,e.,

c(1,2,3),c(4,5,6)

But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.

So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.

I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.

Duppy answered 24/6, 2016 at 14:36 Comment(2)

Does each element in one triplet have to come from different arrays? Will {2,3,4} be considered a valid triplet? – Tati 24/6, 2016 at 14:38

Yes! , {2,3,4}, {6,7,8} or {7,8,9} is not valid. – Duppy 24/6, 2016 at 14:42

Reorganize the data first into a list containing value and array number. Sort the list; you'd have smth like:

1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3

Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers

Apostle answered 24/6, 2016 at 15:21 Comment(1)

Great idea (upvoted), there could be just one difficulty if same number is in more vectors, but it's easy to adapt the solution to account for that :) – Tinsel 24/6, 2016 at 15:24

Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.

N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)

Then i combine them together and order by the observations

dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]

Now I look for rows in this table where all three groups are represented

idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
    length(unique(x))==N
})

Then we can see the triplets with

lapply(which(idx), function(i) {
    dd[i:(i+N-1),]
})

# [[1]]
#    data which
# b2    1     b
# c2    2     c
# a1    3     a
# 
# [[2]]
#    data which
# c1    4     c
# a2    5     a
# b1    6     b

Thick answered 24/6, 2016 at 15:16 Comment(1)

That perfectly worked for the given example. But could you help me out in forming group as I have 'N' number of arrays, all as a list in a List. – Duppy 25/6, 2016 at 7:20

Here is a brute force method with expand.grid and three vectors as in the example

# get all combinations
df <- expand.grid(a,b,c)

Using combn to calculate difference for each pairwise combination.

# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))

# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]

df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
   Var1 Var2 Var3
2     5    6    4
11    3    1    2

Delivery answered 24/6, 2016 at 15:16 Comment(2)

Got any ideas to pass 'N' list in expand.grid() method? – Duppy 25/6, 2016 at 9:43

I just tried and expand.grid will accept a list of vectors. You can collect the vectors in a list using mget and ls. Play around with my answer to this post to build such a list. – Delivery 25/6, 2016 at 13:4

I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.

The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.

Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion. Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.

library(magrittr)

# define function to find the triplets
tripl <- function(...){
  items <- list(...)

  # find the smallest number in each passed vector, along with its index
  # output is a matrix of n-by-2, where n is the number of passed arguments
  triplet.id <- lapply(items, function(x){
    if(is.na(x) %>% prod) id <- c(NA, NA)
    else id <- c(which(x == min(x)), x[which(x == min(x))])
  }) %>% unlist %>% matrix(., ncol=2, byrow=T)


  # find the smallest triplet from the passed vectors
  index <- order(triplet.id[,2])[1:3]
  # create empty vector for output
  out <- vector()

  # go through the smallest triplet's indices
  for(i in index){
    # .. append the coresponding item from the input vector to the out vector
    # .. and remove the value from the input vector
    if(length(items[[i]]) == 1) {
      out <- append(out, items[[i]])
      # .. if the input vector has no value left fill with NA
      items[[i]] <- NA
    }
    else {
      out <- append(out, items[[i]][triplet.id[i,1]])
      items[[i]] <- items[[i]][-triplet.id[i,1]]
    }
  }

  # recurse until all vectors are empty (NA)
  if(!prod(unlist(is.na(items)))) out <- append(list(out), 
                                                do.call("tripl", c(items), quote = F))
  else(out <- list(out))

  # return result
  return(out)
}

The function can be called by passing the input vectors as arguments.

# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

# find all the triplets using our function
y <- tripl(a,b,c)

The result is a list, which contains all the neccesary information, albeit unordered.

print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
# 
# [[3]]
# [1]  7  9 NA
#
# [[4]]
# [1]  8 NA NA

Ordering everything can be done using sapply():

# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
#       [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    4    5    6
# [3,]    7    9   NA
# [4,]    8   NA   NA

The thing is, that it will use only one value per vector to find triplets. It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14). In this instance it would return c(6,8,10) (see below).

a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)

y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
#     [,1] [,2] [,3]
# [1,]    6    8   10
# [2,]    7    9   12
# [3,]   11   13   14

Microphyte answered 27/6, 2016 at 13:51 Comment(0)

Recommended topics

Hot tags