How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?

Asked 8/6, 2011 at 21:51 Answered 9/6, 2011 at 8:22

Solved data-modeling ravendb document-database document-based-database

Document oriented databases (particularly RavenDB) are really intriguing me, and I'm wanting to play around with them a bit. However as someone who is very used to relational mapping, I was trying to think of how to model data correctly in a document database.

Say I have a CRM with the following entities in my C# application (leaving out unneeded properties):

public class Company
{
    public int Id { get; set; }
    public IList<Contact> Contacts { get; set; }
    public IList<Task> Tasks { get; set; }
}

public class Contact
{
    public int Id { get; set; }
    public Company Company { get; set; }
    public IList<Task> Tasks { get; set; }
}

public class Task
{
    public int Id { get; set; }
    public Company Company { get; set; }
    public Contact Contact { get; set; }
}

I was thinking of putting this all in a Company document, as contacts and tasks do not have a purpose out side of companies, and most of the time query for a task or contacts will also show information about the associated company.

The issue comes with Task entities. Say the business requires that a task is ALWAYS associated with a company but optionally also associated with a task.

In a relational model this is easy, as you just have a Tasks table and have the Company.Tasks relate to all tasks for the company, while Contact.Tasks only show the tasks for the specific Task.

For modeling this in a document database, I thought of the following three ideas:

Model Tasks as a separate document. This seems kind of anti-document db as most of the time you look at a company or contact you will want to see the list of tasks, thus having to perform joins over documents a lot.
Keep tasks that are not associated with a contact in the Company.Tasks list and put tasks assocaited with a contact in the list for each individual contacts. This unfortunately means that if you want to see all tasks for a company (which will probably be a lot) you have to combine all tasks for the company with all tasks for each individual contact. I also see this being complicated when you want to disassociate a task from a contact, as you have to move it from the contact to the company
Keep all tasks in the Company.Tasks list, and each contact has a list of id values for tasks it is associated with. This seems like a good approach except for having to manually take id values and having to make a sub-list of Task entities for a contact.

What is the recommended way to model this data in a document oriented database?

Delmore answered 8/6, 2011 at 21:51 Comment(0)

Use denormalized references:

http://ravendb.net/faq/denormalized-references

in essence you have a DenormalizedReference class:

public class DenormalizedReference<T> where T : INamedDocument
{
    public string Id { get; set; }
    public string Name { get; set; }

    public static implicit operator DenormalizedReference<T> (T doc)
    {
        return new DenormalizedReference<T>
        {
            Id = doc.Id,
            Name = doc.Name
        }
    }
}

your documents look like - i've implemented the INamedDocument interface - this can be whatever you need it to be though:

public class Company : INamedDocument
{
    public string Name{get;set;}
    public int Id { get; set; }
    public IList<DenormalizedReference<Contact>> Contacts { get; set; }
    public IList<DenormalizedReference<Task>> Tasks { get; set; }
}

public class Contact : INamedDocument
{
    public string Name{get;set;}
    public int Id { get; set; }
    public DenormalizedReference<Company> Company { get; set; }
    public IList<DenormalizedReference<Task>> Tasks { get; set; }
}

public class Task : INamedDocument
{
    public string Name{get;set;}
    public int Id { get; set; }
    public DenormalizedReference<Company> Company { get; set; }
    public DenormalizedReference<Contact> Contact { get; set; }
}

Now saving a Task works exactly as it did before:

var task = new Task{
    Company = myCompany,
    Contact = myContact
};

However pulling all this back will mean you're only going to get the denormalized reference for the child objects. To hydrate these I use an index:

public class Tasks_Hydrated : AbstractIndexCreationTask<Task>
{
    public Tasks_Hydrated()
    {
        Map = docs => from doc in docs
                      select new
                                 {
                                     doc.Name
                                 };

        TransformResults = (db, docs) => from doc in docs
                                         let Company = db.Load<Company>(doc.Company.Id)
                                         let Contact = db.Load<Contact>(doc.Contact.Id)
                                         select new
                                                    {
                                                        Contact,
                                                        Company,
                                                        doc.Id,
                                                        doc.Name
                                                    };
    }
}

And using your index to retrieve the hydrated tasks is:

var tasks = from c in _session.Query<Projections.Task, Tasks_Hydrated>()
                    where c.Name == "taskmaster"
                    select c;

Which i think is quite clean :)

As a design conversation - the general rule is that if you ever need to load the child documents alone as in - not part of the parent document. Whether that be for editing or viewing - you should model it with it's own Id as it's own document. Using the method above makes this quite simple.

Mccary answered 9/6, 2011 at 8:22 Comment(3)

Ok so I guess I was taking the denormalization too far, but does splitting these up forgo the advantages of a document based db, since I would have to constantly have to do joins between documents? – Delmore 9/6, 2011 at 12:57

you wont because these indexes are lightning fast and the db.Load<T> happens on the server so the cost is minimal. you should consider where your transactional boundaries are and use this method only when you really need to - but it does mean you can have the benefits of both worlds really. I forgot to mention that updating denormalized references (if the name changes for instance) you need to run a patch to update the references. This is again very simple - but a process you need to manage. I find this a small cost which is massively outweighted by the benefits of a schema-less DB :) – Mccary 9/6, 2011 at 16:0

That makes sense :). I'm really liking the idea of document (and more importantly schema-less) databases. Thanks! – Delmore 9/6, 2011 at 16:23

I'm new to document dbs as well...so with a grain of salt...

As a contrasting example...if you are on Twitter and you have a list of the people you follow, which contains a list of their tweets...you would not move their tweets into your twitter account in order to read them, and if you re-tweet, you would only have a copy, not the original.

So, in the same way, my opinion is that if Tasks belong to a company, then they stay within the Company. The Company is the Aggregate Root for Tasks. The Contacts can only hold references (ids) or copies of the Tasks and cannot modify them directly. If you have your contact hold a "copy" of the task, that's fine, but in order to modify the task (e.g. mark it complete) you would modify the task through its Aggregate Root (Company). Since a copy could quickly become outdated, it seems like you would only want a copy to exist while in memory and when saving the Contact, you would only save references to the Tasks.

Magnetics answered 8/6, 2011 at 22:10 Comment(0)

Recommended topics

Hot tags