I have a use case where I need to classify people's trajectories within a big room.
In terms of performance and best Neo4j practices, what option would be better if I want to classify this data to later be able to search/fetch using any type combination of these classifications?
The different classifications are:
- Gender (FEMALE, MALE)
- PersonType (STAFF, CUSTOMER)
- AgeGroup (11-20, 21-30, 31-40, etc)
A trajectory contains a set of points (time, x, y, motion_type) that basically tells where the person went. A point tells you the exact location of the person in the room at a given time and if it was dwelling, walking or running (this is the motion type).
For example get all trajectories that were FEMALE, CUSTOMER with age between 21 and 30
Option 1:
// Here I get the set of trajectories that fall within a certain time range (:Trajectory(at) is indexed)
MATCH (trajectory:Trajectory)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
// Once I have all the trajectories I start filtering by the different criteria
MATCH (trajectory)-[:GENDER]->(:Female)
MATCH (trajectory)-[:PERSON_TYPE]->(:Customer)
// AgeGroup could have quite a lot of groups depending on how accurate the data is. At this stage we have 8 groups.
// Knowing that we have 8 groups, should I filter by property or should I have 8 different labels, one per age group? Is there any other option?
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"
RETURN COUNT(trajectory)
Option 2:
Trajectory node will have as many sub labels as categories available. For example, if I want to have the same result as in Option 1 I will do something like:
MATCH (trajectory:Trajectory:Female:Customer)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"
RETURN COUNT(trajectory)
// Or assuming I have a label per each age group:
MATCH (trajectory:Trajectory:Female:Customer:Age21-30)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(trajectory)
So I want to know:
- How to handle the age groups, if as individual properties, as different labels or if there is a better option.
- How to handle the different categories of a trajectory: as labels or as relationships pointing to a label with the information.
As a note, not every trajectory will have every category. For example, if our facial recognition system cannot detect whether the person is female or male, that category won’t exist for that particular trajectory.
MATCH (n)-[r:RUNNING] WHERE n:Female and n:Customer and n:Age_20_30 ...
and something like the original (option 2) query:MATCH (person:Person:Female:Customer:Age_20-30)-[r:RUNNING]
? – Plumate