Randomly Interleave 2 Arrays In Python
Asked Answered
C

16

15

Suppose I have two arrays:

a = [1, 2, 3, 4]
b = [5, 6, 7, 8, 9]

I want to interleave these two arrays to a variable 'c' (note 'a' and 'b' aren't necessarily of equal length) but I don't want them interleaved in a deterministic way. In short, it isn't enough to just zip these two arrays. I don't want:

c = [1, 5, 2, 6, 3, 7, 4, 8, 9]

Instead, I want something random like:

c = [5, 6, 1, 7, 2, 3, 8, 4, 9]

Also notice that the order of 'a' and 'b' are preserved in the resulting array, 'c'.

The current solution I have requires a for loop and some random number generation. I don't like it and I'm hoping someone can point me to a better solution.

# resulting array
c = []

# this tells us the ratio of elements to place in c. if there are more elements 
# in 'a' this ratio will be larger and as we iterate over elements, we will place
# more elements from 'a' into 'c'.
ratio = float(len(a)) / float(len(a) + len(b))

while a and b:
    which_list = random.random()
    if which_list < ratio:
        c.append(a.pop(0))
    else:
        c.append(b.pop(0))

# tack on any extra elements to the end
if a:
    c += a
elif b:
    c += b
Corticosteroid answered 17/5, 2012 at 23:42 Comment(7)
But you want to randomize only interleaving or the whole array? I mean, do you need to preserve the original arrays order?Leandra
What don't you like about it? You will need to generate random numbers and while you can replace many loops with list comprehensions what's the point?Brinna
Yeah this seems perfectly fine by me. I'm sure you could write something more compact but 'simple is better than complex'.Umbelliferous
What do you mean by a randomly interleaved list? For example, [1, 2, 3, 4, 5, 6, 7, 8, 9] is a valid result of random interleaving. Do you want to randomize the period between insertions?Hoogh
@Leandra I only want to randomize the order in which elements are taken from a and b but the resulting array, c, should preserve the order of a and b.Corticosteroid
@JamesThiele I'm still a bit of a n00b at Python so I'm always looking for ways to make my code more pythonic.Corticosteroid
Aside from fun Python, you really need to thing about what should be next. If you picked 1,2,3 already, should the next choice be 50% 4/50% 5 (even by list) or 16% 4, 84% 5 (weight the 1 unpicked item list a versus the 5 unpicked items in list b).Aparri
E
16

edit: I think this recent one is best:

a = [1, 2, 3, 4]
b = [5, 6, 7, 8, 9]
c = [x.pop(0) for x in random.sample([a]*len(a) + [b]*len(b), len(a)+len(b))]

Or more efficiently:

c = map(next, random.sample([iter(a)]*len(a) + [iter(b)]*len(b), len(a)+len(b)))

Note that the first method above modifies the original lists (as your code did) while the second method does not. On Python 3.x you would need to do list(map(...)) since map returns an iterator.

original answer below:

Here is an option that saves a few lines:

a = [1, 2, 3, 4]
b = [5, 6, 7, 8, 9]

c = []
tmp = [a]*len(a) + [b]*len(b)
while a and b:
    c.append(random.choice(tmp).pop(0))

c += a + b

Here is another option, but it will only work if you know that all of your elements are not falsy (no 0, '', None, False, or empty sequences):

a = [1, 2, 3, 4]
b = [5, 6, 7, 8, 9]

ratio = float(len(a)) / float(len(a) + len(b))
c = [(not a and b.pop(0)) or (not b and a.pop(0)) or
     (random.random() < ratio and b.pop(0)) or a.pop(0)
     for _ in range(len(a) + len(b))]
Erysipelas answered 17/5, 2012 at 23:57 Comment(10)
that's way too much. You don't need to create so many extra lists. You can choose from 2 iterablesLeek
I'm not creating extra lists, tmp contains only references to a and b.Erysipelas
I can see 3 lists that are created but doesn't need to be there. Your second solution seems better but the readability is not so good (and you alter the original lists).Leek
The OP alters the original lists as well.Erysipelas
Merging your original and edit, plus what Joel got below, how about: [choice((a, b)).pop(0) for i in range(len(a + b)) if (a and b) ] + a + bGonta
@TryPyPy: That yields different results - yours is always 50/50 probability between a and b.Phototube
Nice solution. srgerg's solution is very good too, and in my opinion better. In fact, it is a bit more powerful (it does not destroy the original lists), is more elegant (it does not call len(a) twice), and is more general (it naturally accommodates any number of lists).Bride
[x.pop(0) for x in random.sample([a]*len(a) + [b]*len(b), len(a)+len(b))] is more simply written as the more efficient [x.pop(0) for x in random.shuffle([a]*len(a) + [b]*len(b))].Bride
@EOL - random.shuffle shuffles the list in place and returns None, so that code wouldn't work.Erysipelas
@F.J: Right. I would put the shuffle() out of the list comprehension, then, so that the code (1) runs faster and (2) is more explicit (the sample() in the current solution really does a shuffle). I am happy to see that your solution has converged to srgerg's original one (use of iterators and of next()). :)Bride
U
9

Edited to remove superfluous clutter: Here's a solution that works on any number of input lists, doesn't trash the input lists and doesn't copy them either:

import random

def interleave(*args):
    iters = [i for i, b in ((iter(a), a) for a in args) for _ in xrange(len(b))]
    random.shuffle(iters)
    return map(next, iters)

Stackoverflow user EOL has kindly supplied this enhanced version of my solution:

def interleave(*args):
    iters = sum(([iter(arg)]*len(arg) for arg in args), [])
    random.shuffle(iters)
    return map(next, iters)

Running this with

a = [1,2,3,4]
b = [5,6,7,8,9]
print interleave(a, b)

yields the following as one of many possible results:

[5, 6, 7, 1, 8, 2, 3, 9, 4]

Edit: At EOL's request I updated the timing code. Unfortunately, since the accepted solution modifies its inputs, I need to make a fresh copy on each iteration. I've done this for both F.J's and my own solution to make the results comparable. Here's the timing for F.Js solution:

$ python -m timeit -v -s "from srgerg import accepted" -s "a = list(xrange(40000))" -s "b = list(xrange(60000))" "accepted(list(a), list(b))"
10 loops -> 10.5 secs
raw times: 10.3 10.1 9.94
10 loops, best of 3: 994 msec per loop

Here's the timing for my version of the function

$ python -m timeit -v -s "from srgerg import original" -s "a = list(xrange(40000))" -s "b = list(xrange(60000))" "original(list(a), list(b))"
10 loops -> 0.616 secs
raw times: 0.647 0.614 0.641
10 loops, best of 3: 61.4 msec per loop

and here's the timing for EOL's enhanced version:

$ python -m timeit -v -s "from srgerg import eol_enhanced" -s "a = list(xrange(40000))" -s "b = list(xrange(60000))" "eol_enhanced(list(a), list(b))"
10 loops -> 0.572 secs
raw times: 0.576 0.572 0.588
10 loops, best of 3: 57.2 msec per loop

If I remove the list copying from the loop for EOL's enhanced version, I get this:

$ python -m timeit -v -s "from srgerg import eol_enhanced" -s "a = list(xrange(40000))" -s "b = list(xrange(60000))" "eol_enhanced(a, b)"
10 loops -> 0.573 secs
raw times: 0.572 0.575 0.565
10 loops, best of 3: 56.5 msec per loop

Another edit: F.J has an updated solution and asked me to add the timings:

$ python -m timeit -v -s "from srgerg import fj_updated" -s "a = list(xrange(40000))" -s "b = list(xrange(60000))" "fj_updated(list(a), list(b))"
10 loops -> 0.647 secs
raw times: 0.652 0.653 0.649
10 loops, best of 3: 64.9 msec per loop
Ulibarri answered 17/5, 2012 at 23:42 Comment(8)
I added a more straightforward version of your nice interleave() function.Bride
+1: I believe that this should be the accepted answer: it is faster than my solution, and also more general and more elegant (especially my simplified version of interleave()), and it only takes only about twice the memory. :)Bride
Your timings should really be done differently: you should initialize the timers with the setting of a and b, and only time the function calls. In fact, maybe does the list creation take up a good chunk of the total time. I also believe that the timing differences would be much more striking were you to use longer example lists (say 100000 numbers).Bride
@EOL I agree the timing code is very rough. I really only wanted to satisfy myself that my function had a competitive runtime with other solutions. I'll update it shortly.Ulibarri
@EOL Also, the fact that some solutions modify their inputs means I have to make a fresh copy of the inputs on each iteration...Ulibarri
Very nice timing results. Note that list(xrange(40000)) should really be written range(40000), as is usual (there is in fact no point in using xrange if one builds a list right afterwards anyway).Bride
@Ulibarri - Out of curiosity, could you add timing for my updated code? map(next, random.sample([iter(a)]*len(a) + [iter(b)]*len(b), len(a)+len(b))). This should be much more efficient than the pop(0) method and it doesn't modify inputs (although I don't necessary agree that modifying inputs is wrong, since it is what the code in the question does).Erysipelas
@F.J I've added the timings for your updated code. By the way, I didn't mean to say that modifying inputs is wrong, I only meant that in some applications it would be desirable not to modify the inputs.Ulibarri
G
7

Here is a solution that works with an arbitrary number of iterables:

import random

def interleave(*args):
  iters = map(iter, args)
  while iters:
    it = random.choice(iters)
    try:
      yield next(it)
    except StopIteration:
      iters.remove(it)

print list(interleave(xrange(1, 5), xrange(5, 10), xrange(10, 15)))
Germanous answered 18/5, 2012 at 7:46 Comment(3)
+1 Nice answer: This has the desirable properties of taking any number of arguments, not modifying its inputs, not taking copies of its inputs and not requiring that its inputs support the len() method. The only downside is that it is about 1/3rd slower than my solution - but it could be faster without the try-except.Ulibarri
Stackoverflow user Mark Byers has made some worthwhile comments on the randomness of this approach in answer to another question. I did some testing of my own with interesting results.Ulibarri
Really nice solution. I've submitted a modified version that speeds up the removal of empty iterators when there are a large number of iterators: https://mcmap.net/q/757832/-randomly-interleave-2-arrays-in-pythonShrike
B
7

PS: Please consider reading @srgerg's answer: it is in my opinion the best solution (albeit F.J's comes relatively close). Compared to the solution below, it is more general, even a little more straightforward, and it only takes about twice as much memory.

Here is something which is both simple and efficient:

[(a if random.randrange(0, len(a)+len(b)) < len(a) else b).pop(0) for _ in range(len(a)+len(b))]

This solution avoids testing explicitly for the specific case of whether a or b are empty.

This solution uses a few key points:

  • Using randrange() allows one to simply deal with integers (no need to calculate a ratio).
  • It automatically adapts to lists that are empty (that's the < len(a) test), without any need for additional tests like a or b, [… a and b]+a+b

This solution nicely handles lists of different sizes: the elements of the shorter list are spread quite evenly in the result. This approach also features "invariance": the probability distribution of the possible result lists only depends on the current contents of the a and b lists.

It could be made even more efficient by using the faster .pop() instead of .pop(0) (as lists are made to be fast to pop() but not to pop(0)):

a.reverse(); b.reverse()
[(a if random.randrange(0, len(a)+len(b)) < len(a) else b).pop() for _ in range(len(a)+len(b))]
Bride answered 18/5, 2012 at 8:28 Comment(4)
Perhaps I'm misunderstanding something, but you say the solution "avoid tests on the lengths of a or b" but calls to len(a) and len(b) are clearly in the solution?Ulibarri
@srgerg: Good catch: I meant "avoids additional tests" (post edited). In fact, many solutions add non-zero list length tests like while a and b or not a or, etc. There is really no need for a specific handling of exceptional cases (namely zero-length lists).Bride
Nice. I like the use of randrange.Hoogh
+1 for a solution that distributes the shorter list evenly among the longer. Stackoverflow user Mark Byers has written some comments on this as have I at this other question.Ulibarri
H
6

Edited at TryPyPy's suggestion:

from random import choice

l = [a, b]
c = [choice(l).pop(0) for i in range(len(a) + len(b)) if (a and b)] + a + b
Hoogh answered 18/5, 2012 at 0:41 Comment(5)
The choice + pop is really nice and easy to understand in reading. How about: [choice(l).pop(0) for i in range(len(a+b)) if (a and b) ] + a + bGonta
Interesting and quite simple solution. However, a "feature" of this solution is that if one of the lists is much longer than the other one, then the short list will likely be very quickly exhausted. It might be desirable to have the elements of the shorter be spread more evenly in the resulting list.Bride
@EOL: You're right. The above approach would only work well for lists of about the same size.Hoogh
To remedy this issue, one could adjust the ratio of references to a and b in l. If a is twice as long as b for example, then l = [a, b, b]. I wonder what would be an efficient way to implement this...Hoogh
@JoelCornett: my answer post gives an efficient way to implement this. :)Bride
G
2

How about concatenating, then shuffling an array of flags, then using it to pick an array to take each item from?

import random

a = [1, 2, 3, 4]
b = [5, 6, 7, 8, 9]

c = list('a' * len(a) + 'b' * len(b)) # Flags for taking items from each array
random.shuffle(c) # Randomize from where we take items

aa, bb = a[:], b[:] # Copy the arrays for popping 
d = [aa.pop(0) if source == 'a' else bb.pop(0) for source in c]
# Take items in order, randomly from each array

A more efficient way by FogleBird:

c = [a[:]] * len(a) + [b[:]] * len(b)
random.shuffle(c) # Randomize from where we take items

d = [x.pop(0) for x in c] # Take items in order, randomly from each array
Gonta answered 18/5, 2012 at 0:3 Comment(3)
Why use flags when you can just use references to the lists?Phototube
I could say it's easier for me to grasp shuffling flags and taking items than most other answers, but truth is I started from shuffling by misreading the question. Can you propose a way to keep it as obvious as shuffling flags but using direct references to the lists?Gonta
Beautiful and I'd never think of that :)Gonta
D
1

This solution gives you a generator, and works by randomly swapping the parts of lists (a) and (b) that haven't yet been emitted.

import random

a = [1,2,3,4]
b = [5,6,7,8,9]

def interleave(a,b):
   while a or b:
      (a,b)=(a,b) if len(a) and (random.random()<0.5 or not len(b)) else (b,a)
      yield a.pop(0)

print list(interleave(a,b))
Deaconry answered 18/5, 2012 at 0:11 Comment(0)
M
1

Here is something that uses undocumented Python (specifically, the __length_hint__ method of list iterator objects, which tells you how many items are left in the iterator) to cram it into a list comprehension. More for fun, I guess, than actual practicality.

itera, iterb = iter(a), iter(b)
morea, moreb = itera.__length_hint__, iterb.__length_hint__
c = [next(itera) if not moreb() or morea() and random.random() < ratio
     else next(iterb) for c in xrange(len(a) + len(b))]
Muttonhead answered 18/5, 2012 at 0:18 Comment(0)
U
1

I would solve this problem like this:

import random

LResult = []

LLists = [[1, 2, 3, 4], [5, 6, 7, 8, 9]]

while LLists[0] or LLists[1]:
    LResult.append(LLists[random.choice([int(len(LLists[0])==0), int(len(LLists[1])!=0)])].pop(0))

LLists is a multidimentional list that store two lists (a and b from your example). The statement would be equivalent to: LLists = [a[:], b[:]] however I coded in the lists explicitly for the sake of simplicity and clarity.

LResult is c from your example and ultimately stores the resulting array.

The while loop will loop until both LLists sub 0 and LLists sub 1 are completely empty. Within the loop, LResult is appended a value from either LLists sub 0 or LLists sub 1. The decision as to which sub list's value is chosen is determined by the random.choice() statement which takes two (in this case) arguments, then returns one of them randomly.

The options provided to random.choice() are determined by the length of each sub list within LLists. If LLists sub 0's length is greater than zero, then choice number 1 is returned as a zero by the statement int(len(LLists[0])==0). For the second option to random.choice(), if LLists sub 1's length is greater than zero, then the statement, int(len(LLists[1])!=0), will return a 1. In both cases, if one of the sub list's length is zero, then that appropriate statement will return the opposite number. That is, if LLists[0]'s length is zero, and LLists[1]'s length is greater than zero, then the resulting statement will be random.choice(1, 1). In this case random.choice() will return a choice between 1 and 1 (which is 1 of course).

Once the decision is made as to which sub list to pull the value from, the first item is that sub list is poped, .pop(0), into LResult.

Upstage answered 26/5, 2012 at 17:45 Comment(0)
U
0

You could do something like this:

(L, l) = (a, b) if len(a) > len(b) else( b, a)
positions = random.sample(range(len(L)), len(l))
for i in range(len(positions)):
    L.insert(positions[i], l[i])

but in my humble opinion, what you have is perfectly fine.. It works, it's simple

Umbelliferous answered 18/5, 2012 at 0:6 Comment(0)
A
0

How about this idea:

import random as rn

a = [1, 2, 3, 4]
b = [5, 6, 7, 8, 9]
n = 100 #Here i am picking an arbitrary number, it should probably be a function of 
        # lengths of a and b


a_ind = sorted(rn.sample(range(n),len(a))) #sorting the indexes insures that order of 
b_ind = sorted(rn.sample(range(n),len(b))) # a and b is preserved

big_list = zip(a,a_ind) + zip(b,b_ind)

big_list.sort(key  = lambda k: k[1])

result = list(zip(*big_list)[0])

Result:

>>> result
[1, 5, 2, 6, 3, 7, 8, 9, 4]
Appendage answered 18/5, 2012 at 0:35 Comment(0)
D
0

Probably very inefficient, but another approach that works:

import random

def interleave(*args):
    indices=[(i,j) for i in range(len(args)) for j in range(len(args[i]))]
    random.shuffle(indices)
    indices.sort(key=lambda x:x[1])
    return [args[i][j] for i,j in indices]
Dispatcher answered 23/10, 2013 at 13:43 Comment(0)
S
0

I've adapted @NPE's solution so it removes empty iterators in constant rather than linear time*. It takes any number of input lists, and returns an iterator that interleaves them randomly, preserving the order given by the input lists.

def interleave(*args):
  iters = [iter(x) for x in args]
  while iters:
    i = random.randrange(len(iters))
    try:
      yield next(iters[i])
    except StopIteration:
      # swap empty iterator to end and remove
      iters[i],iters[-1] = iters[-1],iters[i]
      iters.pop()

print list(interleave(xrange(1, 5), xrange(5, 10), xrange(10, 15)))

*Total run time is O(N) rather than O(N+M^2) where N is the total number of items and M is the number of lists.

Shrike answered 30/11, 2016 at 18:52 Comment(0)
B
0

If the ratio between list 1 and list 2 remain constant, you could create a function like this:

def selectFromTwoList(ratioFromList1):
    final_list = []
    for i in range(len(list1)):
        rand = random.randint(1, 100)
        if rand <= ratioFromList1*100:
            final_list.append(list1.pop(0))
        else:
            final_list.append(list2.pop(0))
        return final_list
Bridge answered 22/8, 2018 at 19:43 Comment(0)
H
0

I wanted a crude simulation of riffle shuffling cards and thus wrote an implementation with that lens. Checking if a faster implementation already existed I found this thread and found mine to be much faster than any solutions here (~25 seconds vs ~2.5 minutes for 10 million shuffles) so I thought I would share.

def mix_riffle(pile_1: List[int], pile_2: List[int]):
    return_val = []
    
    # Premake max choices we will need.
    pile_choices = random.choices([pile_1, pile_2], k = len(pile_1) + len(pile_2))
    
    finished = False
    while not finished:
        # Force certain pile if other is empty.
        if len(pile_2) == 0:
            return_val = pile_1 + return_val
            pile_1.clear()
            finished = True
        elif len(pile_1) == 0:
            return_val = pile_2 + return_val
            pile_2.clear()
            finished = True
        else:
            # Pick a pile to pull from.
            pull_pile = pile_choices.pop()

            # "Bottom" of choice pile becomes the "Top"
            # of the new pile
            return_val.insert(0, pull_pile.pop())
    return return_val

It consumes the input lists and still uses a loop but if pure speed is your goal consider it.

Horoscope answered 7/4 at 16:45 Comment(0)
P
-1

The word 'interleaved' in the description may be confusing. If you simply add the input lists then shuffle the resultant you get to the same result. Interleaving is only neccessary if the result of interleaving is to be preserved.

Some code:

>>> import random
>>> 
>>> a, b = [1,2,3,4], [5,6,7,8]
>>> c = sum([a,b], [])
>>> random.shuffle(c)
>>> c
[6, 5, 8, 2, 7, 4, 1, 3] 
Priebe answered 21/5, 2012 at 15:17 Comment(1)
Order is important (I had exactly the same answer with this exact response).Gonta

© 2022 - 2024 — McMap. All rights reserved.