Why, if MATLAB is column-major, do some functions output row vectors?
Asked Answered
S

1

14

MATLAB is well-known for being column-major. Consequently, manipulating entries of an array that are in the same column is faster than manipulating entries that are on the same row.

In that case, why do so many built-in functions, such as linspace and logspace, output row vectors rather than column vectors? This seems to me like a de-optimization...

What, if any, is the rationale behind this design decision?

Salomon answered 15/12, 2014 at 19:24 Comment(8)
That's a very good question! My hunch would probably be to support legacy behaviour. Perhaps older versions of MATLAB had it as row vectors initially, and are just keeping that shape to preserve legacy behaviour.... but that's really just a guess. I'm curious to know the answer to this question myself.Bobbiebobbin
Because those outputs are 1D (the first dimension is singleton)? It doesn't matter really, but I'd guess because it is easier to inspect the output on the command line with row vectors.Unclose
@Unclose I personally prefer columns in my output. I can't bear those "Columns 1 to ..." headings in the Command Window...Salomon
It is more compact that way and easier to read for small vectors, true. But for a long row vector, at least it will wrap lines.Unclose
@Bobbiebobbin But the first version of MATLAB was written in FORTRAN, which is itself column-major...Salomon
I never talked about the first version. I talked about older versions. Either way, I think I'll agree with chappjc in that it's simply for easier readouts.Bobbiebobbin
@Bobbiebobbin But then, why would other functions, such as diag (when applied to a matrix), return a column vector? Why this inconsistency?Salomon
You're asking the wrong dude. Sorry! Maybe contact MathWorks?Bobbiebobbin
U
8

It is a good question. Here are some ideas...

My first thought was that in terms of performance and contiguous memory, it doesn't make a difference if it's a row or a column -- they are both contiguous in memory. For a multidimensional (>1D) array, it is correct that it is more efficient to index a whole column of the array (e.g. v(:,2)) rather than a row (e.g. v(2,:)) or other dimension because in the row (non-column) case it is not accessing elements that are contiguous in memory. However, for a row vector that is 1-by-N, the elements are contiguous because there is only one row, so it doesn't make a difference.

Second, it is simply easier to display row vectors in the Command Window, especially since it wraps the rows of long arrays. With a long column vector, you will be forced to scroll for much shorter arrays.

More thoughts...

Perhaps row vector output from linspace and logspace is just to be consistent with the fact that colon (essentially a tool for creating linearly spaced elements) makes a row:

>> 0:2:16
ans =
     0     2     4     6     8    10    12    14    16

The choice was made at the beginning of time and that was that (maybe?).

Also, the convention for loop variables could be important. A row is necessary to define multiple iterations:

>> for k=1:5, k, end
k =
     1
k =
     2
k =
     3
k =
     4
k =
     5

A column will be a single iteration with a non-scalar loop variable:

>> for k=(1:5)', k, end
k =
     1
     2
     3
     4
     5

And maybe the outputs of linspace and logspace are commonly looped over. Maybe? :)

But, why loop over a row vector anyway? Well, as I say in my comments, it's not that a row vector is used for loops, it's that it loops through the columns of the loop expression. Meaning, with for v=M where M is a 2-by-3 matrix, there are 3 iterations, where v is a 2 element column vector in each iteration. This is actually a good design if you consider that this involves slicing the loop expression into columns (i.e. chunks of contiguous memory!).

Unclose answered 15/12, 2014 at 19:53 Comment(11)
Your question brings more questions than answers :) (+1 anyway). Why did the MathWorks decided that a row vector should be used for for loops? Wouldn't it have made more sense to use a column vector? Same question about the colon operator.Salomon
@Judobs It sure does! As a developer, I think it is actually fairly likely that some dev made an arbitrary decision in the dearly days of the product and they got locked in. Still, that doesn't mean the colon convention has any thing to do with the linspace convention, but it's very likely considering that's what colon does.Unclose
@Jubobs Actually, it's not that a row vector is used for loops, it's that it loops through the columns of the loop expression. Meaning, with for v=M where M is a 2-by-3 matrix, there are 3 iterations, where v is a 2 element column vector in each iteration. This is actually a good design if you consider that this involves slicing the loop expression into columns (i.e. chunks of contiguous memory!).Unclose
Yes, that actually makes sense.Salomon
I liked your first comment to the OP above about both cases being vectors (the elements of row and column vectors are both stored contiguously) and rows being easier for Command Window display –you might reiterate those in your answer.Gros
@Unclose My first thought was that in terms of performance and contiguous memory. According to the blog post I link to in my question, there is a difference in performance; see the test involving two loops towards the end. Unfortunately, I haven't had the chance to rerun the benchmark and see the results with my own eyes, but I will as soon as possible.Salomon
@Unclose You had my +1 since your comment :-)Wagonlit
@Jubobs That blog post is demonstrating slicing multidimensional arrays. It is absolutely correct that it is more efficient to index a whole column of a 2D array (e.g. v(:,2)) rather than a row (e.g. v(2,:)) because in the row case it is not accessing elements that are contiguous in memory. However, for a row vector that is 1-by-N, the elements are contiguous because there is only one row. That is why it doesn't matter.Unclose
@Unclose Ok; again, now that you've spelled it out, that makes sense. Your answer remains speculative, but it's good enough for me :)Salomon
@Jubobs In MATLAB, the underlying data buffers have no padding - every element is in a contiguous chunk of memory. I'm not sure where to point in the docs, but using MEX extensively, you know this because you have to access this buffer directly. See the docs for mxGetPr where it says "Once you have the starting address, you can access any other element in the mxArray". Since there is no notion of padding or stride that is not equal to the number of rows, a row vector must have contiguous data.Unclose
@Unclose Ok, thanks. I think my question actually mainly stems from a misunderstanding of precisely this. Thanks again!Salomon

© 2022 - 2024 — McMap. All rights reserved.