Fast way to convert a two dimensional array to a List ( one dimensional )
Asked Answered
B

3

35

I have a two dimensional array and I need to convert it to a List (same object). I don't want to do it with for or foreach loop that will take each element and add it to the List. Is there some other way to do it?

Bentley answered 27/2, 2011 at 9:26 Comment(9)
What is that list supposed to contain?Pickaninny
Is your 2D array rectangular(T[,]) or jagged(T[][])?Seaden
The list contain double. and the 2d array is T[,]Bentley
Why do you want to avoid loops?Seaden
And do you want a short or a fast solution?Seaden
Yanshof: Could you address the comments in your accepted answer? You claim you want a "fast" solution (given the title) but you've accepted the slowest of the three answers.Mientao
@CodesInChaos: i would vote for short.Wicopy
@JonSkeet: I get too many downvotes because of your above comment :) please remove itWicopy
@naveen: No - I stand by my comment. It's the slowest of the approaches, so anyone who is looking for an efficient solution should not use it.Mientao
W
42

To convert double[,] to List<double>, if you are looking for a one-liner, here goes

double[,] d = new double[,]
{
    {1.0, 2.0},
    {11.0, 22.0},
    {111.0, 222.0},
    {1111.0, 2222.0},
    {11111.0, 22222.0}
};
List<double> lst = d.Cast<double>().ToList();


But, if you are looking for something efficient, I'd rather say you don't use this code.
Please follow either of the two answers mentioned below. Both are implementing much much better techniques.
Wicopy answered 27/2, 2011 at 9:43 Comment(11)
Aside from everything else, that will end up boxing every double in the array... it'll perform poorly.Mientao
I think OP accepts this answer because he wants a "clearer"(I mean easy for him to implement and understand) way, not the real fastest way.Muskellunge
@Danny: I'm not really sure how this method is any clearer or easier to understand than a for loop, which the OP explicitly wishes to avoid. Not to mention the title says "Fast".Siliqua
In my quick benchmark of a 1000 x 1000 array, this performs over 30 times as slowly as the for loop or the Buffer.BlockCopy solution. I'm pretty surprised it's been accepted, given the "Fast" part of the title.Mientao
(A longer test changed the timings slightly, but it's still easily an order of magnitude slower.)Mientao
@downvoters: thanks for letting me know that I know less than JonSkeet. :) Please understand that I am not deleting the answer, because OP used this code somewhere and is happy with it. Tell me, how many of you downvoters work at enterprise level? funnyWicopy
@naveen ToList is certainly the approach I'd use most of the time since it's usually fast enough. I assume the downvoters prefer other solutions because the question explicitly asks for a fast way and your solution is relatively slow.Seaden
@CodesInChaos: fast way in India means, finish the code fast. Coding is a highly profitable thing here compared to other jobs. I still believe the guy wanted a solution he could implement fast, not execute fast.Wicopy
@naveen: I see no real evidence of that, and given that the OP can use any of the answers by just copying and pasting them and adjusting to his variable names, they're all equally "fast" by that definition. Even if you think that's the most likely intention, your answer provides no indication of the inefficiency involved which would be appropriate in order to serve all readers rather than just the original poster.Mientao
@JonSkeet: i realize that. the downvote irks me a bit. thats all :)Wicopy
Well maybe you should improve your answer then? Even just explaining that it is slow (and why) would be better. Your answer does not help to answer the question which apparently many people believe was asked (myself included), so those downvotes are reasonable. Look at it this way: you're still massively rep-positive for an answer which doesn't provide an efficient solution.Mientao
M
63

Well, you can make it use a "blit" sort of copy, although it does mean making an extra copy :(

double[] tmp = new double[array.GetLength(0) * array.GetLength(1)];    
Buffer.BlockCopy(array, 0, tmp, 0, tmp.Length * sizeof(double));
List<double> list = new List<double>(tmp);

If you're happy with a single-dimensional array of course, just ignore the last line :)

Buffer.BlockCopy is implemented as a native method which I'd expect to use extremely efficient copying after validation. The List<T> constructor which accepts an IEnumerable<T> is optimized for the case where it implements IList<T>, as double[] does. It will create a backing array of the right size, and ask it to copy itself into that array. Hopefully that will use Buffer.BlockCopy or something similar too.

Here's a quick benchmark of the three approaches (for loop, Cast<double>().ToList(), and Buffer.BlockCopy):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        double[,] source = new double[1000, 1000];
        int iterations = 1000;

        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            UsingCast(source);
        }
        sw.Stop();
        Console.WriteLine("LINQ: {0}", sw.ElapsedMilliseconds);

        GC.Collect();
        GC.WaitForPendingFinalizers();

        sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            UsingForLoop(source);
        }
        sw.Stop();
        Console.WriteLine("For loop: {0}", sw.ElapsedMilliseconds);

        GC.Collect();
        GC.WaitForPendingFinalizers();

        sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            UsingBlockCopy(source);
        }
        sw.Stop();
        Console.WriteLine("Block copy: {0}", sw.ElapsedMilliseconds);
    }


    static List<double> UsingCast(double[,] array)
    {
        return array.Cast<double>().ToList();
    }

    static List<double> UsingForLoop(double[,] array)
    {
        int width = array.GetLength(0);
        int height = array.GetLength(1);
        List<double> ret = new List<double>(width * height);
        for (int i = 0; i < width; i++)
        {
            for (int j = 0; j < height; j++)
            {
                ret.Add(array[i, j]);
            }
        }
        return ret;
    }

    static List<double> UsingBlockCopy(double[,] array)
    {
        double[] tmp = new double[array.GetLength(0) * array.GetLength(1)];    
        Buffer.BlockCopy(array, 0, tmp, 0, tmp.Length * sizeof(double));
        List<double> list = new List<double>(tmp);
        return list;
    }
}

Results (times in milliseconds);

LINQ: 253463
For loop: 9563
Block copy: 8697

EDIT: Having changed the for loop to call array.GetLength() on each iteration, the for loop and the block copy take around the same time.

Mientao answered 27/2, 2011 at 9:44 Comment(10)
The main problem with that one is that it can leave a big temporary array on the large object heap.Seaden
@CodeInChaos: Absolutely. It's a pain we can't tell List<T> to just use the given array :( I think it's still likely to be faster than looping though.Mientao
The problem with telling List<T> to use a certain array is that we could tell several lists to use the same array. Not sure how big a problem that's be in practice.Seaden
@CodeInChaos: Yup, that's why there's no way of doing it. It's probably the right decision on the part of the BCL team - it's just irritating for things like this :)Mientao
One interesting observation on the looping solution is that it's twice as slow if one swaps the inner and outer loop. Most likely due to CPU caches working better if you read/write sequentially.Seaden
@CodeInChaos: Yes, that's a fairly well-known phenomenon. I don't think the reason is so much related to CPU caches as it is the physical location of the data in memory. You have a 2D array indexed in row-major order, it's much faster to iterate through the array sequentially than it is to jump around. Read more about it here and here.Siliqua
Iterating sequentially is faster because when transferring from the RAM to the cache several sequential entries are fetched at the same time. I'd guess that if the size of your array entries is a multiple of the cache-line size the advantage of serial access disappears.Seaden
@JonSkeet: Is there a way in which you can alter the block copy so that is only copies a specific range from the source array. For instance if I wanted all of the data between indexes (10,20) through (20, 20)?Malayopolynesian
@LamdaComplex: I don't believe so - because that isn't copying a block of data - that's copying 11 separate values.Mientao
@JonSkeet: I was really thinking this would help me. I'm actually tackling a slightly different problem involving 3D arrays and grabbing a volume of data from it and flattening it into a 1D array as fast as possible. #8449135Malayopolynesian
W
42

To convert double[,] to List<double>, if you are looking for a one-liner, here goes

double[,] d = new double[,]
{
    {1.0, 2.0},
    {11.0, 22.0},
    {111.0, 222.0},
    {1111.0, 2222.0},
    {11111.0, 22222.0}
};
List<double> lst = d.Cast<double>().ToList();


But, if you are looking for something efficient, I'd rather say you don't use this code.
Please follow either of the two answers mentioned below. Both are implementing much much better techniques.
Wicopy answered 27/2, 2011 at 9:43 Comment(11)
Aside from everything else, that will end up boxing every double in the array... it'll perform poorly.Mientao
I think OP accepts this answer because he wants a "clearer"(I mean easy for him to implement and understand) way, not the real fastest way.Muskellunge
@Danny: I'm not really sure how this method is any clearer or easier to understand than a for loop, which the OP explicitly wishes to avoid. Not to mention the title says "Fast".Siliqua
In my quick benchmark of a 1000 x 1000 array, this performs over 30 times as slowly as the for loop or the Buffer.BlockCopy solution. I'm pretty surprised it's been accepted, given the "Fast" part of the title.Mientao
(A longer test changed the timings slightly, but it's still easily an order of magnitude slower.)Mientao
@downvoters: thanks for letting me know that I know less than JonSkeet. :) Please understand that I am not deleting the answer, because OP used this code somewhere and is happy with it. Tell me, how many of you downvoters work at enterprise level? funnyWicopy
@naveen ToList is certainly the approach I'd use most of the time since it's usually fast enough. I assume the downvoters prefer other solutions because the question explicitly asks for a fast way and your solution is relatively slow.Seaden
@CodesInChaos: fast way in India means, finish the code fast. Coding is a highly profitable thing here compared to other jobs. I still believe the guy wanted a solution he could implement fast, not execute fast.Wicopy
@naveen: I see no real evidence of that, and given that the OP can use any of the answers by just copying and pasting them and adjusting to his variable names, they're all equally "fast" by that definition. Even if you think that's the most likely intention, your answer provides no indication of the inefficiency involved which would be appropriate in order to serve all readers rather than just the original poster.Mientao
@JonSkeet: i realize that. the downvote irks me a bit. thats all :)Wicopy
Well maybe you should improve your answer then? Even just explaining that it is slow (and why) would be better. Your answer does not help to answer the question which apparently many people believe was asked (myself included), so those downvotes are reasonable. Look at it this way: you're still massively rep-positive for an answer which doesn't provide an efficient solution.Mientao
S
11

A for loop is the fastest way.

You may be able to do it with LINQ, but that will be slower. And while you don't write a loop yourself, under the hood there is still a loop.

  • For a jagged array you can probably do something like arr.SelectMany(x=>x).ToList().
  • On T[,] you can simply do arr.ToList() since the IEnumerable<T> of T[,] returns all elements in the 2D array. Looks like the 2D array only implements IEnumerable but not IEnumerable<T> so you need to insert a Cast<double> like yetanothercoder suggested. That will make it even slower due to boxing.

The only thing that can make the code faster than the naive loop is calculating the number of elements and constructing the List with the correct capacity, so it doesn't need to grow.
If your array is rectangular you can obtain the size as width*height, with jagged arrays it can be harder.

int width=1000;
int height=3000;
double[,] arr=new double[width,height];
List<double> list=new List<double>(width*height);
int size1=arr.GetLength(1);
int size0=arr.GetLength(0);
for(int i=0;i<size0;i++)
{  
  for(int j=0;j<size1;j++)
    list.Add(arr[i,j]);
}

In theory it might be possible to use private reflection and unsafe code to make it a bit faster doing a raw memory copy. But I strongly advice against that.

Seaden answered 27/2, 2011 at 9:29 Comment(7)
Could you give a sample of what you're thinking about in the for loop so I can benchmark it against my Buffer.BlockCopy approach? I'd expect mine to be faster, but I want to make sure I'm testing the right thing...Mientao
I think should be arr.Cast<T>().ToList()?Muskellunge
Looks like yours if about twice as fast @JonSeaden
@Danny That's why I edited it while you were writing your comment.Seaden
@CodeInChaos: You can optimize yours somewhat by not calling GetLength on every iteration... but in my tests, Buffer.BlockCopy is still a bit faster.Mientao
Interesting what the jitter does understand, and what it doesn't. In one way it seems to know that the dimensions of an array won't change(since it eliminates the bounds checks, if I use height and width it's much slower) but still doesn't understand that it can remove later calls to get the size.Seaden
@CodeInChaos: Interesting - I knew the JIT optimized around using the vector .Length property, but I didn't realise it would optimize around .GetLength() as well. I've edited my answer to reflect that too - our solutions end up being around the same then.Mientao

© 2022 - 2024 — McMap. All rights reserved.