I have a list with over a million objects in it, trying to find the fastest way to search through it
Asked Answered
D

3

2

I have a list that stores well over a million objects within it. I need to look through the list and update the objects found through the below query as efficiently as possible.

I was thinking of using a Dictionary or HashSet, but I'm relatively new to C# and could not figure out how to implement these other two approaches. My current code is simply a LINQ statement searching through an IList.

public IList<LandObject> landObjects = new List<LandObject>();    

var lObjsToUpdate = from obj in landObjects 
                        where 
            obj.position.x >= land.x - size.x && 
            obj.position.x <= land.x + size.x && 
            obj.position.y >= land.y - size.y && 
            obj.position.y <= land.y + size.y
                    select obj;

 foreach(var item in lObjsToUpdate)
 {
     //do what I need to do with records I've found
 }

Could anyone be so kind as to suggest how I might approach this efficiently?

Doubly answered 9/12, 2013 at 3:40 Comment(10)
I have edited your title. Please see, "Should questions include “tags” in their titles?", where the consensus is "no, they should not".Plaque
What kind of queries are you going to search for? If the answer is some very limited set (or a single type of query), you can probably optimize the data structure to make it O(1)Headstrong
I have a grid of objects being displayed. X and y are floats representing its Vector2 value, I'm trying to find the object with x/y values within a certain range. I need to find the entire object to update several properties on it.Doubly
@Euthyphro, ok and is that calculation above the only criteria you will use to find objects?Headstrong
@sinelaw, yes that's right.Doubly
Are either of size or land constant throughout the lifetime of this list?Headstrong
Yes that's right, size and land are both constant and will not change.Doubly
Will any two items in the list have identical coordinates?Trumantrumann
@Corey, no they will all have unique coordinates.Doubly
I would be cautious with storing that many objects in memory. Is this on a web server or client side application? Is it being stored for as a singleton or will it be spread across (web - sessions / app domains / etc) or a single client app (could be problematic if run from an app server with multiple users). Also note that depending on your usage - you maybe able to use some parallel statements in your looping to speed things up a bit.Haema
H
0

The real answer should involve performance tests and comparisons, and depends on your runtime environment (memory vs. cpu, etc).

The first thing I would try, since land and size are constant, you can maintain a simple HashSet<LandObject> of objects that fit the criteria (in addition to a list or set of all objects or just all other objects). Every time a new object is added, check if it fits the criteria and if so - add it to that set of objects. I don't know how good HashSet is at avoiding collisions when working with object references, but you can try to measure it.

BTW, this related question discussing memory limits of HashSet<int> might interest you. With .NET < 4.5 your HashSet should be ok up to several million items. From what I understand, with some configuration .NET 4.5 removes the limitation of 2gb max object size and you'll be able to go crazy, assuming you have a 64-bit machine.

Headstrong answered 9/12, 2013 at 4:1 Comment(0)
S
0

One thing that will probably help with that many iterations is do the calculations once and use different variables inside your query. Also it should help some to increase the range by 1 on each end and eliminate checking for equals:

public IList<LandObject> landObjects = new List<LandObject>();    
float xmax = land.x + size.x + 1;
float xmin = land.x - size.x - 1;
float ymax = land.y + size.y + 1;
float ymin = land.y - size.y - 1;

var lObjsToUpdate = from obj in landObjects 
                        where 
            obj.position.x > xmin && 
            obj.position.x < xmax && 
            obj.position.y > ymin && 
            obj.position.y < ymax
                    select obj;
Stunt answered 9/12, 2013 at 4:56 Comment(0)
T
0

Ideally you need a way to partition elements so that you don't have to test every single one to find the ones that fit and the ones that should be thrown out. How you partition will depend on how dense the items are - it might be as simple as partitioning on the integer portion of the X coordinate for instance, or by some suitable scaled value of that coordinate.

Given a method (let's call it Partition for now) that takes an X coordinate and returns a partition value for it, you can filter on the X coordinate fairly quickly as a first-pass to reduce the total number of nodes you need to check. You might need to play with the partition function a little to get the right distribution though.

For example, say that you have floating-point coordinates in the range -100 < X <= 100, with your 1,000,000+ objects distributed fairly uniformly across that range. That would divide the list into 200 partitions of (on average) 5000 entries if partitioned on integer values of X. That means that for every integer step in the X dimension of your search range you only have ~5,000 entries to test.

Here's some code:

public interface IPosition2F
{
    float X { get; }
    float Y { get; }
}

public class CoordMap<T> where T : IPosition2F
{
    SortedDictionary<int, List<T>> map = new SortedDictionary<int,List<T>>();
    readonly Func<float, int> xPartition = (x) => (int)Math.Floor(x);

    public void Add(T entry)
    {
        int xpart = xPartition(entry.X);
        List<T> col;
        if (!map.TryGetValue(xpart, out col))
        {
            col = new List<T>();
            map[xpart] = col;
        }
        col.Add(entry);
    }

    public T[] ExtractRange(float left, float top, float right, float bottom)
    {
        var rngLeft = xPartition(left) - 1;
        var rngRight = xPartition(right) + 1;

        var cols =
            from keyval in map
            where keyval.Key >= rngLeft && keyval.Key <= rngRight
            select keyval.Value;

        var cells =
            from cell in cols.SelectMany(c => c)
            where cell.X >= left && cell.X <= right &&
                cell.Y >= top && cell.Y <= bottom
            select cell;

        return cells.ToArray();
    }

    public CoordMap()
    { }

    // Create instance with custom partition function
    public CoordMap(Func<float, int> partfunc)
    {
        xPartition = partfunc;
    }
}

That will partition on the X coordinate, reducing your final search space. If you wanted to take it a step further you could also partition on the Y coordinate... I'll leave that as an exercise for the reader :)

If your parition function is very finely grained and could result in a large number of partitions, it might be useful to add a ColumnRange function similar to:

    public IEnumerable<List<T>> ColumnRange(int left, int right)
    {
        using (var mapenum = map.GetEnumerator())
        {
            bool finished = mapenum.MoveNext();
            while (!finished && mapenum.Current.Key < left)
                finished = mapenum.MoveNext();

            while (!finished && mapenum.Current.Key <= right)
            {
                yield return mapenum.Current.Value;
                finished = mapenum.MoveNext();
            }

        }
    }

The ExtractRange method can then use that like so:

    public T[] ExtractRange(float left, float top, float right, float bottom)
    {
        var rngLeft = xPartition(left) - 1;
        var rngRight = xPartition(right) + 1;

        var cells =
            from cell in ColumnRange(rngLeft, rngRight).SelectMany(c => c)
            where cell.X >= left && cell.X <= right &&
                cell.Y >= top && cell.Y <= bottom
            select cell;

        return cells.ToArray();
    }

I used SortedDictionary for convenience, and because it makes it possible to do an ExtractRange method that is reasonably quick. There are other container types that are possibly better suited to the task.

Trumantrumann answered 9/12, 2013 at 4:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.