I'm learning about data modeling in DocumentDb. Here's where I need some advice
Please see what my documents look like down below.
I can take two approaches here both with pros and cons.
Scenario 1:
If I keep the data denormalized (see my documents below) by keeping project team member information i.e. first, last name, email, etc. in the same document as the project, I can get the information I need in one query BUT when Jane Doe gets married and her last name changes, I'd have to update a lot of documents in the Projects collection. I'd also have to be extremely careful in making sure that all collections with documents that contain employee information get updated as well. If, for example, I update Jane Doe's name in Projects collection but forget to update the TimeSheets collection, I'd be in trouble!
Scenario 2:
If I keep data somewhat normalized and keep only EmployeeId in the project documents, I can then run three queries whenever I want to get a projects list:
- Query 1 returns projects list
- Query 2 would give me EmployeeId's of all project team members that appear in the first query
- Query 3 for employee information i.e. first, last name, email, etc. I'd use the result of Query 2 to run this one
I can then combine all the data in my application.
The problem here is that DocumentDb seems to have a lot of limitations now. I may be reading hundreds of projects with hundreds of employees in project teams. Looks like there's no efficient way to get all employee information whose Id's appear in my second query. Again, please keep in mind that I may need to pull hundreds of employee information here. If the following SQL query is what I'd use for employee data, I may have to run the same query a few times to get all the information I need because I don't think I can have hundreds of OR statements:
SELECT e.Id, e.firstName, e.lastName, e.emailAddress
FROM Employees e
WHERE e.Id = 1111 OR e.Id = 2222
I understand that DocumentDb is still in preview and some of these limitations will be fixed. With that said, how should I approach this problem? How can I efficiently both store/manage and retrieve all project data I need -- including project team information? Is Scenario 1 a better solution or Scenario 2 or is there a better third option?
Here's what my documents look like. First, the project document:
{
id: 789,
projectName: "My first project",
startDate: "9/6/2014",
projectTeam: [
{ id: 1111, firstName: "John", lastName: "Smith", position: "Sr. Engineer" },
{ id: 2222, firstName: "Jane", lastName: "Doe", position: "Project Manager" }
]
}
And here are two employee documents which reside in the Employees collection:
{
id: 1111,
firstName: "John",
lastName: "Smith",
dateOfBirth: "1/1/1967',
emailAddresses: [
{ email: "[email protected]", isPrimary: "true" },
{ email: "[email protected]", isPrimary: "false" }
]
},
{
id: 2222,
firstName: "Jane",
lastName: "Doe",
dateOfBirth: "3/8/1975',
emailAddresses: [
{ email: "[email protected]", isPrimary: "true" }
]
}