Data Structure:
User has many Profiles
(Limit - no more than one of each profile type per user, no duplicates)
Profiles has many Attribute Values
(A user can have as many or few attribute values as they like)
Attributes belong to a category
(No overlap. This controls which attribute values a profile can have)
Example/Context:
I believe with stack exchange you can have many profiles for one user, as they differ per exchange site? In this problem:
- Profile: Video, so Video profile only contains Attributes of Video category
- Attributes, so an Attribute in the Video category may be Genre
- Attribute Values, e.g. Comedy, Action, Thriller are all Attribute Values
Profiles and Attributes are just ways of grouping Attribute Values on two levels. Without grouping (which is needed for weighting in 2. onwards), the relationship is just User hasMany Attribute Values.
Problem:
Give each user a similarity rating against each other user.
- Similarity based on All Attribute Values associated with the user.
- Flat/one level
- Unequal number of attribute values between two users
- Attribute value can only be selected once per user, so no duplicates
- Therefore, binary string/boolean array with Cosine Similarity?
- 1 + Weight Profiles
- Give each profile a weight (totaling 1?)
- Work out profile similarity, then multiply by weight, and sum?
- 1 + Weight Attribute Categories and Profiles
- As an attribute belongs to a category, categories can be weighted
- Similarity per category, weighted sum, then same by profile?
- Or merge profile and category weights
- 3 + Distance between every attribute value
- Table of similarity distance for every possible value vs value
- Rather than similarity by value === value
- 'Close' attributes contribute to overall similarity.
- No idea how to do this one
Fancy code and useful functions are great, but I'm really looking to fully understand how to achieve these tasks, so I think generic pseudocode is best.
Thanks!