How to lightly shuffle a list in python

G

8

5

I have this issue where I would like to shuffle a list, but only do so slightly. Say, I want only a small number of elements to be moved. Is there a simple way to get this done?

Right now the best I can think of is building my own method be hand, but is there some way to use the random library to do this for me?

Gettysburg answered 17/6, 2020 at 18:51 Comment(3)

I doubt it, this seems like a pretty unusual thing to need to do. – Gadoid 17/6, 2020 at 18:54

I don't think the random module has a concept of slightly random. – Prudence 17/6, 2020 at 18:54

Dear OP: Why do you want to shuffle a list only slightly? – Centenarian 27/5, 2022 at 18:48

M

3

to show what some of these solutions are doing I find it helps to run a monte-carlo algorithm many times and look at the distribution

first a tidied up version of @meta4's solution as it was the most fleshed out:

from random import randrange

def partial_shuffle(l, factor=5):
    n = len(l)
    for _ in range(factor):
        a, b = randrange(n), randrange(n)
        l[b], l[a] = l[a], l[b]

which we can run many times by doing:

import numpy as np

n = 8
orig = list(range(n))
occur = np.zeros((n, n), int)

for _ in range(100000):
    x = orig[:]
    partial_shuffle(x,1)
    occur[orig,x] += 1

if we print out the occurrences table as percentages we get:

[[33.5  9.6  9.5  9.4  9.4  9.6  9.5  9.5]
 [ 9.6 33.2  9.7  9.5  9.6  9.6  9.4  9.4]
 [ 9.5  9.6 33.2  9.5  9.6  9.5  9.6  9.5]
 [ 9.5  9.3  9.6 33.4  9.5  9.5  9.5  9.6]
 [ 9.4  9.6  9.4  9.6 33.3  9.5  9.7  9.5]
 [ 9.6  9.5  9.6  9.6  9.4 33.3  9.5  9.6]
 [ 9.4  9.7  9.5  9.5  9.5  9.6 33.2  9.7]
 [ 9.5  9.5  9.6  9.5  9.7  9.5  9.6 33.2]]

each row represents the probability of the item moving to the column. in this case (when n=8) the algorithm will tend to leave elements where they were ~33% of the time, and then pick the remainder uniformly

I can then run (a tidied up) version of pjs's code with:

from random import gauss

orderliness = 2

occur = np.zeros((n, n), int)

for _ in range(100000):
    x = sorted(orig, key=lambda i: gauss(i * orderliness, 1))
    occur[orig,x] += 1

which gives very different output:

[[91.9  7.9  0.1  0.   0.   0.   0.   0. ]
 [ 7.9 84.1  7.8  0.1  0.   0.   0.   0. ]
 [ 0.1  7.8 84.1  7.9  0.1  0.   0.   0. ]
 [ 0.   0.1  7.9 84.1  7.7  0.1  0.   0. ]
 [ 0.   0.   0.1  7.7 84.2  7.8  0.1  0. ]
 [ 0.   0.   0.   0.1  7.9 84.2  7.7  0.1]
 [ 0.   0.   0.   0.   0.1  7.7 84.2  7.9]
 [ 0.   0.   0.   0.   0.   0.1  7.9 91.9]]

i.e. items tend to remain close to where they started

this sort of table is great at detecting bias in the distribution, which there doesn't seem to be evidence of above. but, for example, with Artyom's solution (shuffle(x, lambda: random() / 5)) gives the following:

[[  0.   37.4   0.    0.    0.   16.7  23.8  22.1]
 [  0.    0.  100.    0.    0.    0.    0.    0. ]
 [  0.    0.    0.  100.    0.    0.    0.    0. ]
 [  0.    0.    0.    0.  100.    0.    0.    0. ]
 [  1.7   0.    0.    0.    0.   83.3  11.9   3. ]
 [  9.    7.4   0.    0.    0.    0.   64.2  19.4]
 [ 26.7  17.9   0.    0.    0.    0.    0.   55.5]
 [ 62.6  37.4   0.    0.    0.    0.    0.    0. ]]

which probably isn't what the OP wanted. the high probability off diagonal represents rotating the array by one element

Marutani answered 18/6, 2020 at 22:51 Comment(3)

Interesting, but the behavior in my solution depends heavily on orderliness coefficient. In fact, if you make it negative, it starts having a propensity to reverse the data, while if it's zero the results will be indistinguishable from a random shuffle. In other words, you can tune the algorithm to whatever degree of disorder you want relative to the starting positions. – Lothario 18/6, 2020 at 23:3

That property of having randomness while maintaining an ordering propensity makes it fun to use for benchmarking sorting algorithms where you want to study the impact of order, reverse order, or randomness on the effectiveness of the sort. – Lothario 18/6, 2020 at 23:9

@Lothario have changed the code to make your orderliness parameter more explicit. note that other distributions might be interesting as well. e.g. Cauchy would sometimes put elements a long way away, which might be useful/interesting – Marutani 19/6, 2020 at 9:57

L

4

One interpretation is to strongly or weakly retain the initial ordering. The weakest retention would be a completely random shuffle, the strongest would be to not deviate from the initial ordering.

This can be accomplished by creating a tuple consisting of the original index scaled by a constant, plus some randomness, followed by the value. Sort the tuples, then iterate through to recover the original values in their new order. If the scale factor for the index is near zero, the new order will be random. If it's near 1, things will tend to strongly but not perfectly retain their original ordering. If it's larger, the result becomes unlikely to be shuffled.

import random

orderliness = 0.75

def tuplify(x, y):
    return (orderliness * y + random.gauss(0,1), x)

values = [i+1 for i in range(20)]
print(values)
pairs = list(map(tuplify, values, range(len(values))))
pairs.sort()
partially_ordered_values = [p[1] for p in pairs]
print(partially_ordered_values)

This produces, for example:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]  # initial ordering
[2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 11, 14, 17, 16, 15, 18, 19, 20]  # weakly shuffled

Tendency to shuffle would be determined by the relative magnitudes of orderliness and the standard deviation in random.gauss().

Lothario answered 17/6, 2020 at 21:6 Comment(0)

M

3

to show what some of these solutions are doing I find it helps to run a monte-carlo algorithm many times and look at the distribution

first a tidied up version of @meta4's solution as it was the most fleshed out:

from random import randrange

def partial_shuffle(l, factor=5):
    n = len(l)
    for _ in range(factor):
        a, b = randrange(n), randrange(n)
        l[b], l[a] = l[a], l[b]

which we can run many times by doing:

import numpy as np

n = 8
orig = list(range(n))
occur = np.zeros((n, n), int)

for _ in range(100000):
    x = orig[:]
    partial_shuffle(x,1)
    occur[orig,x] += 1

if we print out the occurrences table as percentages we get:

[[33.5  9.6  9.5  9.4  9.4  9.6  9.5  9.5]
 [ 9.6 33.2  9.7  9.5  9.6  9.6  9.4  9.4]
 [ 9.5  9.6 33.2  9.5  9.6  9.5  9.6  9.5]
 [ 9.5  9.3  9.6 33.4  9.5  9.5  9.5  9.6]
 [ 9.4  9.6  9.4  9.6 33.3  9.5  9.7  9.5]
 [ 9.6  9.5  9.6  9.6  9.4 33.3  9.5  9.6]
 [ 9.4  9.7  9.5  9.5  9.5  9.6 33.2  9.7]
 [ 9.5  9.5  9.6  9.5  9.7  9.5  9.6 33.2]]

each row represents the probability of the item moving to the column. in this case (when n=8) the algorithm will tend to leave elements where they were ~33% of the time, and then pick the remainder uniformly

I can then run (a tidied up) version of pjs's code with:

from random import gauss

orderliness = 2

occur = np.zeros((n, n), int)

for _ in range(100000):
    x = sorted(orig, key=lambda i: gauss(i * orderliness, 1))
    occur[orig,x] += 1

which gives very different output:

[[91.9  7.9  0.1  0.   0.   0.   0.   0. ]
 [ 7.9 84.1  7.8  0.1  0.   0.   0.   0. ]
 [ 0.1  7.8 84.1  7.9  0.1  0.   0.   0. ]
 [ 0.   0.1  7.9 84.1  7.7  0.1  0.   0. ]
 [ 0.   0.   0.1  7.7 84.2  7.8  0.1  0. ]
 [ 0.   0.   0.   0.1  7.9 84.2  7.7  0.1]
 [ 0.   0.   0.   0.   0.1  7.7 84.2  7.9]
 [ 0.   0.   0.   0.   0.   0.1  7.9 91.9]]

i.e. items tend to remain close to where they started

this sort of table is great at detecting bias in the distribution, which there doesn't seem to be evidence of above. but, for example, with Artyom's solution (shuffle(x, lambda: random() / 5)) gives the following:

[[  0.   37.4   0.    0.    0.   16.7  23.8  22.1]
 [  0.    0.  100.    0.    0.    0.    0.    0. ]
 [  0.    0.    0.  100.    0.    0.    0.    0. ]
 [  0.    0.    0.    0.  100.    0.    0.    0. ]
 [  1.7   0.    0.    0.    0.   83.3  11.9   3. ]
 [  9.    7.4   0.    0.    0.    0.   64.2  19.4]
 [ 26.7  17.9   0.    0.    0.    0.    0.   55.5]
 [ 62.6  37.4   0.    0.    0.    0.    0.    0. ]]

which probably isn't what the OP wanted. the high probability off diagonal represents rotating the array by one element

Marutani answered 18/6, 2020 at 22:51 Comment(3)

Interesting, but the behavior in my solution depends heavily on orderliness coefficient. In fact, if you make it negative, it starts having a propensity to reverse the data, while if it's zero the results will be indistinguishable from a random shuffle. In other words, you can tune the algorithm to whatever degree of disorder you want relative to the starting positions. – Lothario 18/6, 2020 at 23:3

That property of having randomness while maintaining an ordering propensity makes it fun to use for benchmarking sorting algorithms where you want to study the impact of order, reverse order, or randomness on the effectiveness of the sort. – Lothario 18/6, 2020 at 23:9

@Lothario have changed the code to make your orderliness parameter more explicit. note that other distributions might be interesting as well. e.g. Cauchy would sometimes put elements a long way away, which might be useful/interesting – Marutani 19/6, 2020 at 9:57

C

1

from random import randint

def partial_shuffle(l, factor=5):
    for _ in range(factor):
        a, b = randint(0, len(l)), randint(0, len(l)) # pick two random indexes
        l[b], l[a] = l[a], l[b] # swap the values at those indexes
    return l

This is the partial Fisher-Yates Shuffle @rossum recomended.

''.join(partial_shuffle(list('abcdefghijklmnopqrstuvwxyz'), 2))

This example yields "abcdefnhijklmgopqrsyuvwxtz", from one run, but will yield something else for a different run.

Caddy answered 17/6, 2020 at 19:43 Comment(0)

M

0

Use the shuffle method of Python's random module. It takes a list and a random in arguments. Where the random is a function which should return float number from 0.0 to 1.0. It helps shuffle to shuffle the given list in a custom way. You can overwrite that function.

import random

def rand():
    return random.random() / 5

arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
random.shuffle(arr, random=rand)
# OUTPUT: [9, 3, 4, 5, 6, 7, 8, 1, 2]

Mandalay answered 17/6, 2020 at 19:37 Comment(5)

Could you explain how the second parameter is used? – Gettysburg 17/6, 2020 at 19:55

shuffle function iterates on list and calls random function, if function will return number less than 0.5 then current number from list wouldn't change position – Mandalay 17/6, 2020 at 20:1

for that reason I wrote function which usually returns numbers less than 0.5 – Mandalay 17/6, 2020 at 20:8

Actually the random argument is more like a seed for the shuffling. random.shuffle(arr, lambda: 0.2) yields [5, 3, 4, 1, 6, 7, 8, 9, 2] always. – Kauffman 17/6, 2020 at 20:43

the source of random.shuffle suggests it's not designed to be used like this. biasing it to smaller values will cause it to approach a list that has been rotated by one element – Marutani 18/6, 2020 at 22:11

K

0

One could also interpret slightly shuffled in the sense that there is a probability for shuffling elements at every step of the Fisher-Yates algorithm @rossum and @meta4 mentioned (instead of having a fixed number of elements shuffled).

def conditional_fy(l, p):
    """Shuffle elements of a list with a given probability

    Args:
        l: list
        p: shuffle probability
            (0: elements are never shuffled,
             1: elements are always shuffled)

    """
    assert 0 <= p <= 1

    for i in range(len(l) - 1, 0, -1):
        shuffle = random.random()
        if shuffle < p:
            j = random.randint(0, i - 1)
            l[i], l[j] = l[j], l[i]

Kauffman answered 17/6, 2020 at 20:19 Comment(0)

S

0

Just to provide an intuition on a possible use case of "slightly shuffling": Imagine decrypting a cipher text (permutation-only) with MCMC. One can consider an initial guess of mapping informed by prior knowledge. One of the many ways to construct such an informed prior is, e.g. to ensure the ranks of frequency statistics of each letter in the de-ciphered text match the ranks of some frequency statistics of each letter in a large corpus. In this case, one might only want to "slightly shuffle" this initial guess across an ensemble of MCMC runs.

A straight forward implementation, since there is already stochasticity in MCMC, is to only allow for shuffling within certain "window" for each letter. Of course, if done in a sequential manner, there might still be a larger "move" of letters, but in a larger vocabulary, this should provide a slight enough shuffling.

Shawn answered 10/5 at 16:38 Comment(0)

A

0

With the same interpretation as pjs' answer (initial ordering more or less preserved), you can move a window of a given width and shuffle the underlying elements. To avoid any bias, move this window randomly.

from random import shuffle

def shuffle_window(array, width):
    indexes = list(range(len(array) - width + 1))
    shuffle(indexes)
    for i in indexes:
        window = array[i : i + width]
        shuffle(window)
        array[i : i + width] = window

Examples:

for width in range(6):
    array = list(range(25))
    shuffle_window(array, width)
    print(array)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
[2, 0, 4, 1, 3, 7, 5, 6, 8, 11, 9, 12, 10, 15, 13, 14, 17, 16, 19, 18, 20, 23, 21, 24, 22]
[2, 4, 0, 3, 7, 1, 8, 6, 12, 5, 11, 10, 14, 9, 16, 15, 17, 13, 19, 18, 20, 24, 21, 22, 23]
[0, 6, 1, 2, 5, 3, 4, 12, 8, 7, 11, 9, 14, 13, 10, 16, 19, 17, 24, 23, 18, 20, 15, 21, 22]
[3, 2, 14, 6, 1, 9, 8, 12, 13, 0, 4, 15, 5, 17, 11, 16, 7, 23, 21, 20, 10, 18, 19, 22, 24]

Ashcan answered 6/10 at 8:24 Comment(0)

C

-1

Use a Fisher-Yates shuffle, but do not run it for the entire list. Just run one step for each entry you want moved: 5 steps to move 5 entries, 10 steps to move 10 entries.

Collaboration answered 17/6, 2020 at 19:9 Comment(5)

That doesn't seem to me like something that would work, care to elaborate? – Gettysburg 17/6, 2020 at 19:12

The normal F-Y shuffle loops over the entire list from one end to the other, shuffling all the entries. I am suggesting stopping the loop after a few iterations so only a few list entries are moved. Rather than looping over 0 .. n-1, loop over 0 .. 5 to move just 5 entries. – Collaboration 17/6, 2020 at 19:25

Yeah, that idea isn't fleshed out at all, and as far as I can tell won't work. The whole idea in F-Y depends essentially on going through the entire list. – Gettysburg 17/6, 2020 at 19:34

Not all items would be equally likely to be displaced. – Lothario 17/6, 2020 at 20:48

@Lothario Then pick the positions to shuffle randomly rather than serially. – Collaboration 17/6, 2020 at 21:31

Recommended topics

Hot tags