I hope I can contribute something new to this problem. I noticed that all of the answers neglect the fact that there are two points where you can perform preprocessing, without slowing down your overall laundry performance.
Also, we don't need to assume a large number of socks, even for large families. Socks are taken out of the drawer and are worn, and then they are tossed in a place (maybe a bin) where they stay before being laundered. While I wouldn't call said bin a LIFO-Stack, I'd say it is safe to assume that
- people toss both of their socks roughly in the same area of the
bin,
- the bin is not randomized at any point, and therefore
- any subset taken from the top of this bin generally contains both
socks of a pair.
Since all washing machines I know about are limited in size (regardless of how many socks you have to wash), and the actual randomizing occurs in the washing machine, no matter how many socks we have, we always have small subsets which contain almost no singletons.
Our two preprocessing stages are "putting the socks on the clothesline" and "Taking the socks from the clothesline", which we have to do, in order to get socks which are not only clean but also dry. As with washing machines, clotheslines are finite, and I assume that we have the whole part of the line where we put our socks in sight.
Here's the algorithm for put_socks_on_line():
while (socks left in basket) {
take_sock();
if (cluster of similar socks is present) {
Add sock to cluster (if possible, next to the matching pair)
} else {
Hang it somewhere on the line, this is now a new cluster of similar-looking socks.
Leave enough space around this sock to add other socks later on
}
}
Don't waste your time moving socks around or looking for the best match, this all should be done in O(n), which we would also need for just putting them on the line unsorted.
The socks aren't paired yet, we only have several similarity clusters on the line. It's helpful that we have a limited set of socks here, as this helps us to create "good" clusters (for example, if there are only black socks in the set of socks, clustering by colours would not be the way to go)
Here's the algorithm for take_socks_from_line():
while(socks left on line) {
take_next_sock();
if (matching pair visible on line or in basket) {
Take it as well, pair 'em and put 'em away
} else {
put the sock in the basket
}
I should point out that in order to improve the speed of the remaining steps, it is wise not to randomly pick the next sock, but to sequentially take sock after sock from each cluster.
Both preprocessing steps don't take more time than just putting the socks on the line or in the basket, which we have to do no matter what, so this should greatly enhance the laundry performance.
After this, it's easy to do the hash partitioning algorithm. Usually, about 75% of the socks are already paired, leaving me with a very small subset of socks, and this subset is already (somewhat) clustered (I don't introduce much entropy into my basket after the preprocessing steps). Another thing is that the remaining clusters tend to be small enough to be handled at once, so it is possible to take a whole cluster out of the basket.
Here's the algorithm for sort_remaining_clusters():
while(clusters present in basket) {
Take out the cluster and spread it
Process it immediately
Leave remaining socks where they are
}
After that, there are only a few socks left. This is where I introduce previously unpaired socks into the system and process the remaining socks without any special algorithm - the remaining socks are very few and can be processed visually very fast.
For all remaining socks, I assume that their counterparts are still unwashed and put them away for the next iteration. If you register a growth of unpaired socks over time (a "sock leak"), you should check your bin - it might get randomized (do you have cats which sleep in there?)
I know that these algorithms take a lot of assumptions: a bin which acts as some sort of LIFO stack, a limited, normal washing machine, and a limited, normal clothesline - but this still works with very large numbers of socks.
About parallelism:
As long as you toss both socks into the same bin, you can easily parallelize all of those steps.
waitpid
so that, as the parent, you're not even sorting any socks yourself? – Narrowsparallel.task(child).waitpid.all
the performance gain the accepted answer promises is outweighed by the exponential decline of the number of matching specimen in the pile over time. – Kadnerhttps://devblogs.microsoft.com/
), – Unstressed