For the last few days, I've tried to accomplish the following task regarding the analysis of a Set of Objects, and the solutions I've come up with rely heavily on Memory (obtaining OutOfMemory exceptions in some cases) or take an incredible long time to process. I now think is a good idea to post it here, as I'm out of ideas. I will explain the problem in detail, and provide the logic I've followed so far.
Scenario:
First, we have an object, which we'll name Individual, that contains the following properties:
- A date
- A Longitude - Latitude pair
Second, we have another object, which we'll name Group, which definition is: A set of Individuals that, together, match the following conditions:
All individuals in the set have a date which, within each other, is not superior to 10 days. This means that all of the Individuals, if compared within each other, don´t differ in 10 days between each other.
The distance between each object is less than Y meters.
A group can have N>1 individuals, as long as each of the Individuals match the conditions within each other. All individuals are stored in a database. All groups would also be stored in a database.
The task:
Now, consider a new individual. The system has to check if the new individual:
Belongs to an existing Group or Groups
The Individual now forms one or multiple new Groups with other Individuals.
Notes:
The new individual could be in multiple existing groups, or could create multiple new groups.
SubGroups of Individuals are not allowed, for example if we have a Group that contains Individuals {A,B,C}, there cannot exist a group that contains {A,B}, {A,C} or {B,C}.
Solution (limited in processing time and Memory)
First, we filter the database with all the Individuals that match the initial conditions. This will output a FilteredIndividuals enumerable, containing all the Individuals that we know will form a Group (of 2) with the new one.
Briefly, a Powerset is a set that contains all the possible subsets of a particular set. For example, a powerset of {A,B,C} would be: {[empty], A, B, C, AB, AC, BC, ABC}
Note: A powerset will output a new set with 2^N combinations, where N is the length of the originating set.
The idea with using powersets is the following:
First, we create a powerset of the FilteredIndividuals list. This will give all possible combinations of Groups within the FilteredIndividuals list. For analysis purposes and by definition, we can omit all the combinations that have less than 2 Individuals in them.
We Check if each of the Individuals in a combination of the powerset, match the conditions within each other. If they match, that means that all of the Individuals in that combination form a Group with the new Individual. Then, to avoid SubGroups, we can eliminate all of the subsets that contain combinations of the Checked combination. I do this by creating a powerset of the Checked combination, and then eliminating the new powerset from the original one.
At this point, we have a list of sets that match the conditions to form a Group.
Before formally creating a Group, I compare the DB with other existing Groups that contain the same elements as the new sets: If I find a match, I eliminate the newly created set, and add the new Individual to the old Group. If I don't find a match, it means they are new Groups. So I add the new Individual to the sets and finally create the new Groups.
This solution works well when the FilteredIndividuals enumerable has less than 52 Individuals. After that, Memory exceptions are thrown (I know this is because of the maximum size allowed for data types, but incrementing such size is not of help with very big sets. For your consideration, the top amount of Individuals that match the conditions I've found is 345).
Note: I have access to the definition of both entities. If there's a new property that would reduce the processing time, we can add it.
I'm using the .NET framework with C#, but if the language is something that requires changing, we can accept this as long as we can later convert the results to object understandable by our main system.
[goal is] to create "Groups" of "Individual", with [constraints met]
- exactly what I don't get/accept: what use areGroup
s? What "operations" do the have to support?FIFO
1st in, 1st out? Fixed upper/lower limits on size? Or FCFS - 1st come, 1st "served"/processed? – Overdone