matlab: dividing vector into overlapping chunks of fixed size
Asked Answered
G

5

8

I've a vector that I would like to split into overlapping subvectors of size cs in shifts of sh. Imagine the input vector is:

v=[1 2 3 4 5 6 7 8 9 10 11 12 13]; % A=[1:13]

given a chunksize of 4 (cs=4) and shift of 2 (sh=2), the result should look like:

[1 2 3 4]
[3 4 5 6]
[5 6 7 8]
[7 8 9 10]
[9 10 11 12]

note that the input vector is not necessarily divisible by the chunksize and therefore some subvectors are discarded. Is there any fast way to compute that, without the need of using e.g. a for loop? In a related post I found how to do that but when considering non-overlapping subvectors.

Gentile answered 16/12, 2013 at 11:22 Comment(0)
C
6

You can use the function bsxfun in the following manner:

v=[1 2 3 4 5 6 7 8 9 10 11 12 13]; % A=[1:13]
cs=4;
sh=2;

A = v(bsxfun(@plus,(1:cs),(0:sh:length(v)-cs)'));

Here is how it works. bsxfun applies some basic functions on 2 arrays and performs some repmat-like if the sizes of inputs do not fit. In this case, I generate the indexes of the first chunk, and add the offset of each chunck. As one input is a row-vector and the other is a column-vector, the result is a matrix. Finally, when indexing a vector with a matrix, the result is a matrix, that is precisely what you expect.

And it is a one-liner, (almost) always fun :).

Cocky answered 16/12, 2013 at 13:16 Comment(4)
Try it with cs = 5;sh = 2;, will give you three and NOT two (two would be correct as sh=2) overlapping entriesCheriecherilyn
@bjoern As I mentioned in the comment to your question, your solution indeed gives a different outcome. However, that does not neccesarily mean it is correct.Laos
Okay, now I see what you are refering to. I was 100% sure that sh described the number of overlapping entries, but now I see that you both refer to it as another measure. Sorry my fault, really must have missed that Oo And I'm only NOW seeing that the description isn't unique as the posted example is ambiguousCheriecherilyn
one-liner bsxfun (almost) always deserves +1!Bipetalous
W
3

Do you have the signal processing toolbox? Then the command is buffer. First look at the bare output:

buffer(v, 4, 2)

ans =
     0     1     3     5     7     9    11
     0     2     4     6     8    10    12
     1     3     5     7     9    11    13
     2     4     6     8    10    12     0

That's clearly the right idea, with only a little tuning necessary to give you exactly the output you want:

[y z] = buffer(v, 4, 2, 'nodelay');
y.'

ans =
     1     2     3     4
     3     4     5     6
     5     6     7     8
     7     8     9    10
     9    10    11    12

That said, consider leaving the vectors columnwise, as that better matches most use cases. For example, the mean of each window is just mean of the matrix, as columnwise is the default.

Weapon answered 16/12, 2013 at 15:9 Comment(1)
I like the use of these little gems that no one knew it was in a toolbox already. The problem is that it can leave you with partial data in the last frame but it depends what you want to achieve.Soelch
L
2

I suppose the simplest way is actually with a loop. A vectorizes solution can be faster, but if the result is properly preallocated the loop should perform decently as well.

v = 1:13
cs = 4;
sh = 2;

myMat = NaN(floor((numel(v) - cs) / sh) + 1,cs);
count = 0;

for t = cs:sh:numel(v)
   count = count+1;
   myMat(count,:) = v(t-cs+1:t);
end
Laos answered 16/12, 2013 at 12:15 Comment(9)
I don't see why this should be necessary, especially as Tin asked for a solution without for-loop.Cheriecherilyn
@bjoern: I agree with Dennis, his solution is just simpler. Arrayfun is basically also a for loop, and in this case even significantly slower.Peplos
@bjoern Note that arrayfun is pretty much just a wrapper for a loop (and thus will not really outperform the simple loop).Laos
Yeah it might as well be a for-loop but written in C and not in MATLAB, and if performance doesn't matter it might also be a matter of code complexity and code-nesting where the question-author likes to have a shorter code. Whatsoever, I simply posted what the author asked for ;-)Cheriecherilyn
Oh okay, he asked for a fast way and didn't specifically state anything about a short-code. So okay then..Cheriecherilyn
Dennis, I think you've still got an error in your code -> v(t-3:t); what is this supposed to be? I mean there's something about cs missing in it, right now you are always fetching 3 entries which is wrongCheriecherilyn
That's what I first changed as well to correct it but something is still wrong with it, cs = 5; sh = 2; gives me three overlapping entries in the matrix and not 2Cheriecherilyn
@DennisJaheruddin: I believe the way you calculate the number of chunks is not correct, see my answer. You calculate the number of rows to preallocate for myMat based on sh only, while it should also include cs. For cs = 10 and sh = 2, it should only produce 2 rows, while in your case it preallocates 5 rows.Uranie
@BasSwinckels Thanks, that part felt a bit dodgy.Laos
N
1

You can accomplish this with ndgrid:

>> v=1:13; cs=4; sh=2;
>> [Y,X]=ndgrid(1:(cs-sh):(numel(v)-cs+1),0:cs-1)
>> chunks = X+Y
chunks =
     1     2     3     4
     3     4     5     6
     5     6     7     8
     7     8     9    10
     9    10    11    12

The nice thing about the second syntax of the colon operator (j:i:k) is that you don't have to calculate k exactly (e.g. 1:2:6 gives [1 3 5]) if you plan to discard the extra entries, as in this problem. It automatically goes to j+m*i, where m = fix((k-j)/i);

Different test:

>> v=1:14; cs=5; sh=2; % or v=1:15 or v=1:16
>> [Y,X]=ndgrid(1:(cs-sh):(numel(v)-cs+1),0:cs-1); chunks = X+Y
chunks =
     1     2     3     4     5
     4     5     6     7     8
     7     8     9    10    11
    10    11    12    13    14

And a new row will form with v=1:17. Does this handle all cases as needed?

Nidianidicolous answered 16/12, 2013 at 21:30 Comment(0)
C
0

What about this? First I generate the starting-indices based on cs and sh for slicing the single vectors out of the full-length vector, then I delete all indices for which idx+cs would exceed the vector length and then I'm slicing out the single sub-vectors via arrayfun and afterwards converting them into a matrix:

v=[1 2 3 4 5 6 7 8 9 10 11 12 13]; % A=[1:13]
cs=4;
sh=2;

idx = 1:(cs-sh):length(v);
idx = idx(idx+cs-1 <= length(v))
A = arrayfun(@(i) v(i:(i+cs-1)), idx, 'UniformOutput', false);
cell2mat(A')

E.g. for cs=5; sh=3; this would give:

idx =

     1     3     5     7


ans =

     1     2     3     4     5
     3     4     5     6     7
     5     6     7     8     9
     7     8     9    10    11

Depending on where the values cs; sh come from, you'd probably want to introduce a simple error-check so that cs > 0; as well as sh < cs. sh < 0 would be possible theoretically if you'd want to leave some values out in between.

EDIT: Fixed a very small bug, should be running for different combinations of sh and cs now.

Cheriecherilyn answered 16/12, 2013 at 11:41 Comment(3)
Why should I? It works with different numbers as well. idx is only supposed to give me the starting indices of the subvectors and therefore, I def. need cs-sh as the step - EDIT: I tried using different vectors and different numbers of cs and sh and it works quite nicely.Cheriecherilyn
For cs=5; sh=3 I would assume the starting indices to be 1 4 7 rather than 1 3 5 7. If this is the case one would use idx=1:sh:length(v).Laos
Sorry but I got to tell you, that's wrong. Just rethink about it. 1 3 5 7 are the correct indices - just look at my output matrix which obviously IS correct (length of 5 (i.e. 5 columns) as you can clearly see, which is the chunksize cs=5) and three overlapping entries (the last three entries of each row are the three first entries of the next row)Cheriecherilyn

© 2022 - 2024 — McMap. All rights reserved.