How to ensure two different vectors are shuffled in the same order in C++? [duplicate]
Asked Answered
H

8

21

I have two vectors:

vector1 = [1 2 3 4 5 6 7 8 9]

vector2 = [1 2 3 4 5 6 7 8 9]

I want to ensure, that when I shuffle both using random_shuffle they should be shuffled in the same corresponding order. For example:

Output after shuffling should be like:

vector1 = [1 9 3 4 2 7 8 5 6]

vector2 = [1 9 3 4 2 7 8 5 6]

But I am getting output like:

vector1 = [5 1 7 4 2 3 9 8 6]

vector2 = [3 4 1 9 8 2 5 7 6]

Heres my code:

int main () 
{
  std::srand ( unsigned ( std::time(0) ) );
  std::vector<int> vector1, vector2;

  // set some values:
  for (int i=1; i<10; ++i)
  {
    vector1.push_back(i);
    vector2.push_back(i);
  }

  // using built-in random generator:
  std::random_shuffle ( vector1.begin(), vector1.end() );
  std::random_shuffle ( vector2.begin(), vector2.end() );

  // print out content:
  std::cout << "vector1 contains:";
  for ( std::vector<int>::iterator it1 = vector1.begin(); it1 != vector1.end(); ++it1 )
    std::cout << ' ' << *it1;

  std::cout << '\n';
  std::cout << '\n';

  std::cout << "vector2 contains:";
  for ( std::vector<int>::iterator it2 = vector2.begin(); it2 != vector2.end(); ++it2 )
    std::cout << ' ' << *it2;

  std::cout << '\n';
  std::cout << '\n';

  return 0;
}

EDIT This is an example case that I tried to implement. In practise, I have one vector of images and one vector of corresponding labels. I need them to be shuffled in the same manner. Could anybody please help...... thanks a lot!!

Hayne answered 6/6, 2013 at 16:53 Comment(4)
Why not just vector2 = vector1;?Silversmith
Pretty sure he wants a seeded shuffle - meaning he can pass in his own shuffler to get it if he wants.Macdonell
THANKS YOU ALL!!! GOT TO LEARN QUITE A FEW THINGS!!! :)Hayne
BTW std::random_shuffle was deprecated in C++14 and removed in C++17. The best answer below now is this one: https://mcmap.net/q/593981/-how-to-ensure-two-different-vectors-are-shuffled-in-the-same-order-in-c-duplicateGrayish
S
31

Instead of shuffling the vectors themselves, shuffle a vector of indexes into the other vectors. Since you'll be using the same indexes for both, they're guaranteed to be in the same order.

std::vector<int> indexes;
indexes.reserve(vector1.size());
for (int i = 0; i < vector1.size(); ++i)
    indexes.push_back(i);
std::random_shuffle(indexes.begin(), indexes.end());

std::cout << "vector1 contains:";
for ( std::vector<int>::iterator it1 = indexes.begin(); it1 != indexes.end(); ++it1 )
    std::cout << ' ' << vector1[*it1];
Scalariform answered 6/6, 2013 at 17:7 Comment(8)
+1. This is more robust and testable than trying to control the shuffles by resetting the seed.Mitinger
And in C++ 11, you can use std::iota(indexes.begin(), indexes.end(), 0); to create the initial index vector.Ethelda
It's a good idea but needs a good implementation of how to shuffle, at best inplace, both vectors.Jamilajamill
@Mark even though i'm a beginner...but i wana learn smart programming! too good!!! crisp & clear & concise! thanks!!!Hayne
@JanHerrmann, once you have the index (reorder) vector you can use an answer to one of these questions: #838884 https://mcmap.net/q/530012/-in-place-array-reorderingScalariform
@JanHerrmann or you could code the algorithm yourself without using std::random_shuffle and shuffle both vectors simultaneously using the same random numbers in the Fisher-Yates shuffle algorithm but that would be a different answer.Scalariform
As I read answers about in place array reordering I see that they are either O(N^2) or are not inplace.Jamilajamill
This common sense solution but requires that the indexes buffer of size N is created. Other solutions such as using boost zip_iterator or custom swap function are most elegant solution when you have the following requirements: 1) Memory use is concern 2) You are using a true random_device which does not reproduce sequences of numbersColima
U
17

Make sure you use the same seed for both calls to random_shuffle():

auto seed = unsigned ( std::time(0) );

// ...

std::srand ( seed );
std::random_shuffle ( vector1.begin(), vector1.end() );

std::srand ( seed );
std::random_shuffle ( vector2.begin(), vector2.end() );

Notice, however, that the Standard does not specify that random_shuffle() should use the rand() function to generate a random permutation - this is implementation-defined. Therefore, srand() will not affect the result of random_shuffle() on implementations that do not use rand().

Paragraph 25.3.12/4 of the C++11 Standard on random_shuffle() specifies:

Remarks: To the extent that the implementation of these functions makes use of random numbers, the implementation shall use the following sources of randomness:

The underlying source of random numbers for the first form of the function is implementation-defined. An implementation may use the rand function from the standard C library. [...]

Therefore, if you want to make sure you are writing portable code, use the version of random_shuffle() that accepts a random number generator as a third argument, so that you have control over the seeding.

Underpinnings answered 6/6, 2013 at 16:56 Comment(0)
A
11

As others have shown, re-seeding with the same seed should allow you to replicate the same shuffle multiple times. However, if you can use C++11 I'd recommend implementing this without using srand() and random_shuffle(); instead you should use the <random> library with std::shuffle.

First, if possible rand should be avoided. Aside from the fact that it may not be a very good pRNG, it also has problems with thread safety due to shared state. The <random> library fixes both these problems by giving the programmer explicit control over pRNG state and by providing several options with guaranteed performance, size, and quality characteristics.

Secondly, random_shuffle isn't actually specified to use rand so it's theoretically legal for reseeding using srand not to have the effect you want. To get guaranteed results with random_shuffle you have to write your own generator. Moving to shuffle fixes that, as you can directly use standard engines.

#include <algorithm> // shuffle, copy
#include <iostream>  // cout
#include <iterator>  // begin, end, ostream_iterator
#include <numeric>   // iota
#include <random>    // default_random_engine, random_device
#include <vector>    // vector

int main() {
  std::vector<int> v1(10);
  std::iota(begin(v1), end(v1), 1);
  auto v2 = v1;

  std::random_device r;
  std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};

  // create two random engines with the same state
  std::mt19937 eng1(seed);
  auto eng2 = eng1;

  std::shuffle(begin(v1), end(v1), eng1);
  std::shuffle(begin(v2), end(v2), eng2);

  std::copy(begin(v1), end(v1), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "\n\n";
  std::copy(begin(v2), end(v2), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "\n\n";
}
Ashwell answered 6/6, 2013 at 18:16 Comment(2)
This should have been the accepted answer.Grayish
considering that random_shuffle was deprecated in C++14 and completely removed in C++17 , now this solutions is the best choice as it uses std::shuffle .Penult
J
4

You could create an random access iterator which if its dereferenced returns a std::tuple to references of elements of the corresponding vectors. So you could shuffle them inplace. Or you look at the boost version. So it should look something like this:

std::random_shuffle(
  boost::make_zip_iterator(
    boost::make_tuple(vector1.begin(), vector2.begin())
  ),
  boost::make_zip_iterator(
    boost::make_tuple(vector1.end(), vector2.end()
  ),

);

This shuffles your data inplace, works with more than two vectors and is self documenting if you know what make_zip_iterator does. Of course it should be faster than shuffle two times or use a third vector.

Jamilajamill answered 6/6, 2013 at 17:16 Comment(3)
This is a good idea for readability (and ranges would make it even more readable) but it's not definite that it would be faster than alternatives. In particular I'm thinking that the different memory access patterns could have an unexpected effect on performance.Ashwell
@Ashwell I think you have in any case an unexpected effect on performance. As I think the purpose of random shuffle negates all efforts to make it cache friendly. But it uses one less vector, which might be more cache friendly and it is O(n). If your are able to reorder your two vectors faster than O(n) The shouffling of a third vector may be faster. But now you have at least 2 simply implementable versions and could be able to messure. I would be intersted in the results.Jamilajamill
+1 As this is a library-only based implementation of the custom swap function suggested in comments and one proposed solution. Would work with a tru std::random_device without creating intermediate index tables. Memory access/performance may not be perfect but for random shuffle this is always an issue!Colima
W
2

Seed the pseudo-random number generator with a reproducible value before each time you shuffle.

std::srand ( 42 );
std::random_shuffle ( vector1.begin(), vector1.end() );
std::srand ( 42 );
std::random_shuffle ( vector2.begin(), vector2.end() );
Waiter answered 6/6, 2013 at 16:55 Comment(0)
M
2

If both have to have the same order, why are they separate vectors? The logical solution would be something like:

struct ImageData
{
    Image myImage;
    std::string myLabel;
    //  ...
};

You then have a single vector of ImageData which you shuffle.

Metacenter answered 6/6, 2013 at 17:21 Comment(0)
N
0

Unfortunately, if we use srand we change internal seed-value. I mean, the next random numbers will be predetermined. And, first decision:

std::srand ( 42 );
std::random_shuffle ( vector1.begin(), vector1.end() );
std::srand ( 42 );
std::random_shuffle ( vector2.begin(), vector2.end() );
std::srand ( unsigned ( std::time(0) ) );
// Post-code.

To save rand for post-code.

Second decision - it's Mark Ransom solution - it doesn't call std::srand at all (and, I just notice, it has a higher performance).

Nunci answered 6/6, 2013 at 17:22 Comment(0)
G
-1

Why don't you write your own shuffle:

for( size_t i = 0 ; i < numitems; ++i )
{
    size_t next = random() % numitems ;
    swap( v1[i], v1[next] );
    swap( v2[i], v2[next] );
}
Garrek answered 6/6, 2013 at 17:8 Comment(5)
Perhaps because it is so easy to get wrong (like you did).Metacenter
@JamesKanze Could you please elaborate?Yellowish
The algorithm you post doesn't give a truly random shuffle. In particular, it has pow( maxRandom, numitems ) different possible results, but there are fact( numitems ) different possible orderings. Since pow( maxxRandom, numitems ) is not a multiple of fact( numitems ), the chance of each ordering cannot be equal.Metacenter
Also it's just generally bad practice to re-implement existing functionality. (Btw, your code could be fixed by doing two things, first generate next in the range (i, numitems) instead of [0,numitems); Secondly (and less importantly) using a proper uniform distribution rather than random() % n.)Ashwell
+1 For 'conceptual' solution that would work with std::random_device which may not be deterministic and cannot be expected to repeat the same pattern twice. Just needs implementing properly as mentioned by previous commenters!Colima

© 2022 - 2024 — McMap. All rights reserved.