Querying by node labels vs relationships
Asked Answered
Z

1

6

I have a use case where I need to classify people's trajectories within a big room.

In terms of performance and best Neo4j practices, what option would be better if I want to classify this data to later be able to search/fetch using any type combination of these classifications?

The different classifications are:

  1. Gender (FEMALE, MALE)
  2. PersonType (STAFF, CUSTOMER)
  3. AgeGroup (11-20, 21-30, 31-40, etc)

A trajectory contains a set of points (time, x, y, motion_type) that basically tells where the person went. A point tells you the exact location of the person in the room at a given time and if it was dwelling, walking or running (this is the motion type).

For example get all trajectories that were FEMALE, CUSTOMER with age between 21 and 30

Option 1:

// Here I get the set of trajectories that fall within a certain time range (:Trajectory(at) is indexed)
MATCH (trajectory:Trajectory)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")

// Once I have all the trajectories I start filtering by the different criteria
MATCH (trajectory)-[:GENDER]->(:Female)
MATCH (trajectory)-[:PERSON_TYPE]->(:Customer)

// AgeGroup could have quite a lot of groups depending on how accurate the data is. At this stage we have 8 groups.
// Knowing that we have 8 groups, should I filter by property or should I have 8 different labels, one per age group? Is there any other option?
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"

RETURN COUNT(trajectory)

Option 2:

Trajectory node will have as many sub labels as categories available. For example, if I want to have the same result as in Option 1 I will do something like:

MATCH (trajectory:Trajectory:Female:Customer)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")
MATCH (trajectory)-[:AGE]->(age:AgeGroup)
WHERE age.group = "21-30"

RETURN COUNT(trajectory)

// Or assuming I have a label per each age group:
MATCH (trajectory:Trajectory:Female:Customer:Age21-30)
WHERE datetime("2020-01-01T00:00:00.000000+13:00") <= trajectory.at < datetime("2020-01-11T00:00:00.000000+13:00")

RETURN COUNT(trajectory)

So I want to know:

  1. How to handle the age groups, if as individual properties, as different labels or if there is a better option.
  2. How to handle the different categories of a trajectory: as labels or as relationships pointing to a label with the information.

As a note, not every trajectory will have every category. For example, if our facial recognition system cannot detect whether the person is female or male, that category won’t exist for that particular trajectory.

Zuleika answered 29/1, 2020 at 1:50 Comment(0)
P
1

When you follow the concepts in https://neo4j.com/docs/getting-started/current/graphdb-concepts/ you have two base types of nodes Persons and Locations Person nodes can have multiple labels

  1. Male or Female for gender
  2. Staff or Customer for type of Person
  3. Age specific labels Age_20_30

Location nodes have properties for the location but not one for the time.

The Trajectory I would model as a relation between a Person and a Location with an property for time with the type of movement. So you could have relations of the type DWELLING, WALKING and RUNNING between the nodes Person and Location

In your queries

MATCH (n)-[r]
WHERE n:Female and n:Customer and n:Age_20_30
AND   datetime("2020-01-01T00:00:00.000000+13:00") <= r.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(r)

Count Running Customers would be

MATCH (n)-[r:RUNNING]
WHERE n:Female and n:Customer and n:Age_20_30
AND   datetime("2020-01-01T00:00:00.000000+13:00") <= r.at < datetime("2020-01-11T00:00:00.000000+13:00")
RETURN COUNT(r)
Paranoid answered 29/1, 2020 at 14:44 Comment(2)
Is there much difference between MATCH (n)-[r:RUNNING] WHERE n:Female and n:Customer and n:Age_20_30 ... and something like the original (option 2) query: MATCH (person:Person:Female:Customer:Age_20-30)-[r:RUNNING]?Plumate
I guess that both will work. Biggest difference with your option 2 solution is that trajectory is not longer a node but a relation of type Running. In the where clause you also can combine and's and or's WHERE n:Female and n:Customer and ( n:Age_20_30 or n:Age_40_50)Paranoid

© 2022 - 2024 — McMap. All rights reserved.