Reshape vector to matrix with column-wise zero padding in matlab
Asked Answered
J

3

4

for an input matrix

in = [1 1;
      1 2;
      1 3;
      1 4;
      2 5;
      2 6;
      2 7;
      3 8;
      3 9;
      3 10;
      3 11];

i want to get the output matrix

out = [1 5 8;
       2 6 9;
       3 7 10;
       4 0 11];

meaning i want to reshape the second input column into an output matrix, where all values corresponding to one value in the first input column are written into one column of the output matrix.

As there can be different numbers of entries for each value in the first input column (here 4 values for "1" and "3", but only 3 for "2"), the normal reshape function is not applicable. I need to pad all columns to the maximum number of rows.

Do you have an idea how to do this matlab-ish?

The second input column can only contain positive numbers, so the padding values can be 0, -x, NaN, ...

The best i could come up with is this (loop-based):

maxNumElem = 0;
for i=in(1,1):in(end,1)
    maxNumElem = max(maxNumElem,numel(find(in(:,1)==i)));
end

out = zeros(maxNumElem,in(end,1)-in(1,1));
for i=in(1,1):in(end,1)
    tmp = in(in(:,1)==i,2);
    out(1:length(tmp),i) = tmp;
end
Jackshaft answered 14/1, 2015 at 15:30 Comment(4)
Do you know the max number of values for each "index"? Is it constant? I.e. can we assume a fixed number of rows? Will there always be one "index" with 4 values?Wee
The max number of values for each "index" is not constant. Assumptions about it can't be made.Jackshaft
Don't vectorize it unless you must. Your code is significantly clearer and easy to read!Midis
Check out the just added vectorized approach with bsxfun based solution and see how it performs for your input?Sartor
A
4

Either of the following approaches assumes that column 1 of in is sorted, as in the example. If that's not the case, apply this initially to sort in according to that criterion:

in = sortrows(in,1);

Approach 1 (using accumarray)

  1. Compute the required number of rows, using mode;
  2. Use accumarray to gather the values corresponding to each column, filled with zeros at the end. The result is a cell;
  3. Concatenate horizontally the contents of all cells.

Code:

[~, n] = mode(in(:,1));                                                 %//step 1
out = accumarray(in(:,1), in(:,2), [], @(x){[x; zeros(n-numel(x),1)]}); %//step 2
out = [out{:}];                                                         %//step 3

Alternatively, step 1 could be done with histc

n = max(histc(in(:,1), unique(in(:,1))));                               %//step 1

or with accumarray:

n = max(accumarray(in(:,1), in(:,2), [], @(x) numel(x)));               %//step 1

Approach 2 (using sparse)

Generate a row-index vector using this answer by @Dan, and then build your matrix with sparse:

a = arrayfun(@(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
out = full(sparse([a{:}], in(:,1), in(:,2)));
Albertype answered 14/1, 2015 at 16:56 Comment(6)
I've always wondered if accumarray could be used to partition a set like this! Now that I see your @(x){x} it's so obvious!+1 Do you know how the ordering of the values in the cells is determined?Thies
@Thies It is not guaranteed, unless the first parameter of accumarray is sorted. See here and here some interesting discussions about thisAlbertype
Oh! The second one is actually quite interesting to know before going accumarray all over the place.Thies
@Thies I've included a note in the answer to explicitly state that in needs to be sorted (this is necessary in the second approach as well). Thanks for the the heads-up BTW!Albertype
The first one looks really nice, interesting what you can do with accumarray =) It appears to be a bit slower than the solution i posted though. The second one on the other hand is a bit faster, but not outstandingly. Still, thanks for you help! =)Jackshaft
@Jackshaft Welcome! Yes, for loops in Matlab are not as slow as they used to beAlbertype
S
2

Introduction to proposed solution and Code

Proposed here is a bsxfun based masking approach that uses the binary operators available as builtins for use with bsxfun and as such I would consider this very appropriate for problems like this. Of course, you must also be aware that bsxfun is a memory hungry tool. So, it could pose a threat if you are dealing with maybe billions of elements depending also on the memory available for MATLAB's usage.

Getting into the details of the proposed approach, we get the counts of each ID from column-1 of the input with histc. Then, the magic happens with bsxfun + @le to create a mask of positions in the output array (initialized by zeros) that are to be filled by the column-2 elements from input. That's all you need to tackle the problem with this approach.

Solution Code

counts = histc(in(:,1),1:max(in(:,1)))'; %//' counts of each ID from column1

max_counts = max(counts); %// Maximum counts for each ID
mask = bsxfun(@le,[1:max_counts]',counts); %//'# mask of locations where
                                           %//   column2 elements are to be placed
out = zeros(max_counts,numel(counts)); %// Initialize the output array
out(mask) = in(:,2); %// place the column2 elements in the output array

Benchmarking (for performance)

The benchmarking presented here compares the proposed solution in this post against the various methods presented in Luis's solution. This skips the original loopy approach presented in the problem as it appeared to be very slow for the input generated in the benchmarking code.

Benchmarking Code

num_ids = 5000;
counts_each_id = randi([10 100],num_ids,1);
num_runs = 20;  %// number of iterations each approach is run for

%// Generate random input array
in = [];
for k = 1:num_ids
    in = [in ; [repmat(k,counts_each_id(k),1) rand(counts_each_id(k),1)]];
end

%// Warm up tic/toc.
for k = 1:50000
    tic(); elapsed = toc();
end

disp('------------- With HISTC + BSXFUN Masking approach')
tic
for iter = 1:num_runs
    counts = histc(in(:,1),1:max(in(:,1)))';
    max_counts = max(counts);
    out = zeros(max_counts,numel(counts));
    out(bsxfun(@le,[1:max_counts]',counts)) = in(:,2);
end
toc
clear counts max_counts out

disp('------------- With MODE + ACCUMARRAY approach')
tic
for iter = 1:num_runs
    [~, n] = mode(in(:,1));                                                 %//step 1
    out = accumarray(in(:,1), in(:,2), [], @(x){[x; zeros(n-numel(x),1)]}); %//step 2
    out = [out{:}];
end
toc
clear n out

disp('------------- With HISTC + ACCUMARRAY approach')
tic
for iter = 1:num_runs
    n = max(histc(in(:,1), unique(in(:,1))));
    out = accumarray(in(:,1), in(:,2), [], @(x){[x; zeros(n-numel(x),1)]}); %//step 2
    out = [out{:}];
end
toc
clear n out

disp('------------- With ARRAYFUN + Sparse approach')
tic
for iter = 1:num_runs
    a = arrayfun(@(x)(1:x), diff(find([1,diff(in(:,1).'),1])), 'uni', 0); %//'
    out = full(sparse([a{:}], in(:,1), in(:,2)));
end
toc
clear a out

Results

------------- With HISTC + BSXFUN Masking approach
Elapsed time is 0.598359 seconds.
------------- With MODE + ACCUMARRAY approach
Elapsed time is 2.452778 seconds.
------------- With HISTC + ACCUMARRAY approach
Elapsed time is 2.579482 seconds.
------------- With ARRAYFUN + Sparse approach
Elapsed time is 1.455362 seconds.
Sartor answered 15/1, 2015 at 12:2 Comment(2)
Nice idea =) I don't get this huge speed improvement though; for me it's only slightly faster.Jackshaft
@LuisMendo Well thank you! But I guess it depends on a lot on the actual inputs being used at OP's end and as such the runtimes vary too..Sartor
S
0

slightly better, but still uses a loop :(

out=zeros(4,3);%set to zero matrix
for i = 1:max(in(:,1)); %find max in column 1, and loop for that number
ind = find(in(:,1)==i); % 
out(1: size(in(ind,2),1),i)= in(ind,2);
end

don't know if you can avoid the loop...

Serpasil answered 14/1, 2015 at 15:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.