Linq fast intersect query - enhancement?
Asked Answered
D

2

5

In our company there are thousands(!) of cars. each car has a GPS device which sends periodically (cycle) its location.

So each Cycle contains :

  • List<Cars> ( cars that sent location – corresponding to the CycleNum)
  • CycleNum which is Cycle number

CycleNum is determined by a server.

So for example in CycleNum=1 , 4 cars sent their location :

enter image description here

Classes I used ( simplification )

static int TotalCycles=0;

class Car
{
 public int CarId;
 public int Location ;
}


class Cycle
{
  public  int CycleNum;
  public List<Car> Cars;
  public Cycle ()
    {
      CycleNum=(++TotalCycles);
    }
     
}

Let's fill some data :

   List<Cycle> LstCyclces = new List<Cycle>();
   Cycle cycle =null;

   cycle = new Cycle();//cycle 1
   cycle.Cars = new List<Car>();
   cycle.Cars.Add(new Car {CarId=1 , Location=40});
   cycle.Cars.Add(new Car {CarId=2 , Location=21});
   cycle.Cars.Add(new Car {CarId=3 , Location=5});
   cycle.Cars.Add(new Car {CarId=4 , Location=15});
   LstCyclces.Add(cycle);
   
   cycle = new Cycle();//cycle2
   cycle.Cars = new List<Car>();
   cycle.Cars.Add(new Car {CarId=1 , Location=40}); //same location
   cycle.Cars.Add(new Car {CarId=2 , Location=57});//changed location
   cycle.Cars.Add(new Car {CarId=3 , Location=100});//changed location
   cycle.Cars.Add(new Car {CarId=4 , Location=7});//changed location
   cycle.Cars.Add(new Car {CarId=7 , Location=2});//new attended ( vs previous cycle)
   LstCyclces.Add(cycle);
   
   cycle = new Cycle();//cycle3
   cycle.Cars = new List<Car>();
   cycle.Cars.Add(new Car {CarId=1 , Location=40}); //same
   cycle.Cars.Add(new Car {CarId=2 , Location=5});//changed Location
   cycle.Cars.Add(new Car {CarId=4 , Location=1});//changed Location
   cycle.Cars.Add(new Car {CarId=9 , Location=7});//new attended ( vs previous cycle)
   LstCyclces.Add(cycle);

Visualization :

enter image description here

As you can see :

  • A new car can come in to the cycle
  • A car can also get out from a cycle
  • A car can change Location(obviously)

Question

I was asked to:

For a specific given cycle Number — find all Cars that were also anticipated in the previous cycle where :

("new Location" - "previous Location") < abs(40)

And from that result set , find all cars PAIRS where :

(Car_A.Location - Car_B.Location) < abs(65)

In short - I need all cars that gave me info also for the previous cycle and also they didn't go very far from their previous location and finally - from those cars - I need to know which cars are near to each other.

Very important : I can not check only current Location , because we need to make sure also that cars didn't get very far from their previous location.

So according to the picture : looking at cycleNum=2 :

The cars who anticipated also in the previous Cycle (1) were Cars: 1,2,3,4.

From that result : The cars who didn't go very far from their previous location :

("new Location" - "previous Location") < abs(40)

Were cars : 1,2,4.

From that result I need to find all pairs of car who are now not far from each other :

(Car_A.Location - Car_B.Location) < abs(65) :

So the result should be IEnumerable: (format isn't matter )

  • { Car1 , Car2 , distance=17} //the distance between those 2 cars < 65
  • { Car1 , Car4 , distance=33} //the distance between those 2 cars < 65
  • { Car2 , Car4 , distance=50} //the distance between those 2 cars < 65

//I dont mind having all permutation ( {car1 car2} , {car2 car1} )

What have I tried :

   var cycleToCheck=2;
   //get all cars from desired cycle
   var requestedCycleCars =  LstCyclces.Where(c=>c.CycleNum==cycleToCheck).SelectMany(c=>c.Cars);
   //get all cars from previous  cycle
   var previousCycleCars =  LstCyclces.Where(c=>c.CycleNum==cycleToCheck-1).SelectMany(c=>c.Cars);
   //intersec between those 
   var MyWrongIntersect =requestedCycleCars.Intersect(previousCycleCars,new MyEqualityComparer());

But I only get cars from the current cycle and not from previous cycle ,Also - I need reference both to cars from current cycle and previous cycle(without reiterating) - for calculations.

Also I think I'm on the wrong path using SelectMany and this is suppose to be the fastest it can be(c# , plinq?). I wish it could be in one query.

Any help ?

Full code working online

nb , of course I can do it in phases , but reiterating , or ToList()'s are bad approach for me. I was hoping for a single plinq query

Edit

Posted solution works OK logically but not performantly.

2 cycles , where each has 10,000 cars : > 9min!!! :

https://i.sstatic.net/mjLvG.jpg

enter image description here

How can I improve it? (asparallel didnt work much)

Deplorable answered 29/10, 2014 at 11:55 Comment(3)
Have you considered sorting? A simple sort could reduce the number of actual comparisons you actually have to make. At present (without really thinking about it) I believe your algorithm has O(n^2).Shuma
@Shuma are you are talking about the && x.Item1.CarId < x.Item2.CarId part ?Deplorable
Nope I meant the (Car_A.Location - Car_B.Location) < abs(65) part. If you sort the cars by location, then obviously the first car and the last car is not going to be in a similar location. Try to store your data in a form that make querying quick. That the basis of databases.Shuma
C
5

Well, as far as the efficiency goes,

From that result I need to find all pairs of car who are now not far from each other : is the one that is pretty killer for performance, assuming that there are a lot of such pairs. The naive algorithm would run n^2 atleast. You'd want to use SQL spatial type, which would make the query a lot more efficient.

If you are not willing to do it/can't, there's not much you can do to improve the performance, I am willing to guess.

Here is a next code that will do efficient join between Cars. It's important that CarId is indexed. After we have find all the pairs c.Distance <40, we'll do the final processing on client's computer, as this allows us to process the sorted cars by ourselves in an efficient manner.

var cycleNum = 2;

var curCycleCars = LstCyclces[cycleNum - 1].Cars;
var prevCycleCars = LstCyclces[cycleNum - 2].Cars;

var cars = curCycleCars.Join(prevCycleCars, 
                    p => p.CarId, 
                    y => y.CarId,
                    (f1, f2) => new { 
                            Car = f1,
                            Distance = f1.Location - f2.Location
                        })
                    .Where(c => c.Distance < 40)
                    .Select(c => c.Car)
                    .OrderBy(car => car.Location)
                    .ToList();



var carPairs = new CarPairList[cars.Count()];

for(var i = 0; i < cars.Count; i++)
{
    var curCar = cars[i];
    var curStartIndex = i + 1;

    if(i > 0)
    {
        var previousCarPair = carPairs[i - 1];
        if(previousCarPair!=null)
        {
            curStartIndex = previousCarPair.EndIndex;
        }
    }

    int j;
    for(j = curStartIndex; j < cars.Count; j++)
    {
        var dis = cars[j].Location - curCar.Location;
        if(dis >= 65) break;
    }

    var startIndex = i + 1;
    var endIndex = j - 1;

    if(endIndex >= startIndex)
    {
        carPairs[i] = new CarPairList(curCar, 
                            startIndex, endIndex);
    }
}

foreach(var carPair in carPairs.Where(x => x!=null)){
    Console.WriteLine("Car " + carPair.Car.CarId);
    Console.WriteLine("Cars near the distance: ");

    for(var i = carPair.StartIndex; i <= carPair.EndIndex; i++){
        Console.WriteLine("\t - {0}, distance {1}", 
            cars[i].CarId,
            cars[i].Location - carPair.Car.Location);
    }

    Console.WriteLine();
}

class CarPairList
{
    public readonly Car Car;
    public readonly int StartIndex;
    public readonly int EndIndex;

    public CarPairList(Car car, 
        int startIndex,
        int endIndex){
        Car = car;
        StartIndex = startIndex;
        EndIndex = endIndex;
    }
}
Cloe answered 1/11, 2014 at 20:18 Comment(6)
NO sql . all in memory processing :-) ( and we have alot)Deplorable
it's weird thought that non of the answers uses plinq to make max cores hot?Deplorable
@RoyiNamir, there's not much to parallelize imo. The data is dependent. You can try adding .AsParallel() before ToList(). Btw I also changed the way how car pairs are found. The memory allocation is now minimum, and it should be a lot faster.Cloe
& also, without the PRINTING part, the above algorithm should take 1 second maximum with ~10 000 cars, as the running time is O(nlogn). If it takes more than that, you're going wrong somewhere else.Cloe
im just trying it (pasting to vs)Deplorable
100% ( 1M cars , less than 2 sec) , you're greatDeplorable
D
2

Tty this code

    var cycleToCheck = 2;

    var query = LstCyclces.FirstOrDefault(c => c.CycleNum == cycleToCheck).Cars
                                .Where(c => LstCyclces.FirstOrDefault(p => p.CycleNum == cycleToCheck - 1).Cars
                                            .Any(ca => ca.CarId == c.CarId && Math.Abs(c.Location - ca.Location) < 40));

    var result = query.SelectMany(t1 => query.Select(t2 => Tuple.Create(t1, t2)))
            .Where(x => Math.Abs(x.Item1.Location - x.Item2.Location) < 65 && x.Item1.CarId < x.Item2.CarId);


    foreach (var r in result)
    {           
        Console.WriteLine("{0} - {1}", r.Item1.CarId, r.Item2.CarId);
    }

Here is working sample

Edited

Duster answered 29/10, 2014 at 12:19 Comment(9)
+1 Thank you. However I'm trying to avoid(as i said) Tolists() and middle buffers , and hopefully using plinq , since I told , we have thousands(!) of cars ( and this is very simplified code that I gave , the actually class are a bit bigger).Deplorable
The query.SelectMany(t1 => query.Select(t2 => Tuple.Create(t1, t2))); is geniously. im still thinking how it works. ( combination generator)Deplorable
It's actually transformed to CROSS JOIN in SQLDuster
Can you please see my edit ? about execution times ?Deplorable
@MaxBrodin Is there a reason you use Tuple.Create instead of new { MeaningfulPropertyName = t1, OtherMeaningfulPropertyName = t2 } ?Zibeline
After you found cars for the cycleToCheck you have a Where call that calls LstCyclces.FirstOrDefault(p => p.CycleNum == cycleToCheck - 1) for every car which is not efficient. You can calculate in once.Thusly
@Thusly I dont understand . this condition is mandatory for the first query.can you please post the suggested fix ?Deplorable
@RoyiNamir, sorry, it was a fix for @MaxBrodin answer. He could calculate this LstCyclces.FirstOrDefault(p => p.CycleNum == cycleToCheck - 1) once and store in variable instead of calculating it on every iteration in the Where clauseThusly
@Thusly but it DOES needs to find out for each(!) car if it was in the previous cycle. or are you talking about the scalar value ?Deplorable

© 2022 - 2024 — McMap. All rights reserved.