Matlab: Dividing chunks of data randomly into equal sized sets
Asked Answered
G

2

1

I have a large dataset that I need to divide randomly into 5 almost equal sized sets for cross validation. I have happily used _crossvalind_ to divide into sets before, however this time I need to divide chunks of data into these groups at a time.

Let's say my data looks like this:

data = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18];

Then I want to divide them randomly into 5 groups in chunks of 2, e.g. like this

g1 = [3 4], [11 12]  
g2 = [9 10]  
g3 = [1 2], [15 16]  
g4 = [7 8], [17 18]  
g5 = [5 6], [13 14]

I think I can do this with some for-loops, but I'm guessing there must be a much more cost-efficient way to do it in matlab :-)

Any suggestions?

Growl answered 2/4, 2011 at 18:54 Comment(0)
M
3

I'm interpreting your needs to be random ordering of sets, but within each set, the ordering of elements is unchanged from the parent set. You can use randperm to randomly order the number of sets and use linear indexing for the elements.

dataElements=numel(data);%# get number of elements
totalGroups=5;
groupSize=dataElements/totalGroups;%# I'm assuming here that it's neatly divisible as in your example
randOrder=randperm(totalGroups);%# randomly order of numbers from 1 till totalGroups
g=reshape(data,groupSize,totalGroups)';             %'# SO formatting
g=g(randOrder,:);

The different rows of g give you the different groupings.

Mcgrath answered 2/4, 2011 at 19:46 Comment(6)
@R. M.: Your last line is a bit on the complicated side. Why not replace it by: g = reshape(data,groupSize,totalGroups)'; g = g(randOrder,:);Countersignature
@Jonas: thanks, you're right; that was complicated! i've replaced the line.Mcgrath
@R. M.: Now it looks like what I'd have answered. +1 :)Countersignature
I think I might have explained my problem wrongly, or I just can't see yet that this is exactly what I need :) I will try to look at my problem and figure out if this is what I needed. Thanks :)Growl
Now I edited my description of the problem. As far is I understand randperm() wouldn't work for a problem like that, or am i wrongGrowl
@danielhc, you can use the same algorithm to do that. so instead of 5 groups with 2 sub-groups of 2 elements in each, you can first divide it into 10 groups of 2 elements and take every 2 rows as your data. if you have to leave one group with a single sub-group (as in g2), then just pad your data vector with 2 NaNs or 0s at the end, do as I just mentioned, and then remove the NaNs.Mcgrath
X
0

You can shuffle the array (randperm) and then divide it into consequentive equal parts.

data = [10 20 30 40 50 60 70 80 90 100 110 120 130 140 150];
permuted = data(randperm(length(data)));
% padding may be required if the length of data is not divisible by the size of chunks
k = 5;
g = reshape(permuted, k, length(data)/k);
Xylotomy answered 2/4, 2011 at 19:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.