MongoDB: storage & when to use relationships
Asked Answered
M

2

7

I'm new to MongoDB, so please bear with me.

I have 2 questions:

First, take the following:

// add a record
$obj = array( "title" => "Calvin and Hobbes", "author" => "Bill Watterson" );

Does MongoDB store "title" and "author" as text for every single entry of this object in this collection? Or does it create a schema and convert these to field numbers (or nothing at all and store purely the data)?

My second question is: when should "relations" be used? Let's say I have 100 resellers, who contain (object-wise) 1,000 clients each, and each client has 10 projects. That makes for one huge overall object to manipulate.

In the SQL world, this would all be related "objects". In the Document world, we try to store complete objects by embedding sub-objects.

However, this can be unwieldy. What is the best practice for this? Can someone point me to a guideline please.

Thanks.

Mcneese answered 2/3, 2011 at 17:51 Comment(2)
Yes, title and author are stored as text in the DB. Depending on the language/driver/wrapper you are using, this can typically be re-mapped so that the field is t in the database, but title when accessed from your objects. From the standpoint of relationships, you'll typically find that you'll have to optimize for "something". There is no single "best way".Geldens
Please split your questions into the separate stackoverflow questions the next time.Selfassurance
S
13

Does MongoDB field names for every entry in this collection?

Yes, MongoDB does store the text for every record. In practice this is not usually too much of a problem if disk space is a limiting factor, you may want to consider something else.

When should "relations" be used?

This is more an art then a science. The Mongo Documentation on Schemas is a good reference, but here are some things to consider:

  • Put as much in as possible

    The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.

  • Separated data that can be referred to from multiple places into its own collection.

    This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer the the same data it is more efficient and less error prone to update a single record and keep references to it in other places.

  • Document size considerations

    MongoDB imposes a 4MB size limit on a single document. In a world of GB of data this sounds small, but it is also 30 million tweets or 250 thousand typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information then one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization.

    In the example you gave, I would make 3 separate collections because I do not need to know about the 9 other projects to create a listing for a project. I will keep the queries simple. (But see Protip at the bottom)

  • Complex data structures:

    MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data store specifically designed for this type of data that one should consider as well)

  • Data Consistency

    MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design you schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.

Pro Tip

Even when you do use references, it is often a good idea to keep a little of the data from the reference in the parent document. Generally, I keep enough information to build a meaningful link to the descendant in the parent.

In you example this would mean keeping client names along with the ObjectID in the reseller's document so I could create a link to each client by name without a separate query. If building the URL for the client requires something besides their document id I would store that as well.

Tricks like this can cut down on the 1+n query situations.

Skydive answered 2/3, 2011 at 20:43 Comment(3)
As a note, most recent versions of MongoDB are now up to 16MB. That's version 1.8.0. So if you're starting a new project, 16MB is going to be your "new" max.Geldens
We were advised against making documents too large as this could impact performance. Was this bad advice?Amathist
Crashalot: Again, more art then science in your question. MongoDB does not perform as well as SQL databases when doing JOINs. Grabbing a bunch of documents will perform far worse then retrieving one large one. On the other hand if you very rarely need the JOINed data, you can save a lot of network bandwidth (and memory) by normalizing data. Example: Given that a user is allowed to provide a picture, if that picture is displayed next to each post then it makes sense to store it in the User document, if it is only displayed on the User's information page, then put it in a separate document.Skydive
P
1

Does MongoDB store "title" and "author" as text for every single entry of this object in this collection?

MongoDB is schemaless - so the answer is obvious: yes since there is no such thing as a schema

My second question is: when should "relations" be used? Let's say I have 100 resellers, who contain (object-wise) 1,000 clients each, and each client has 10 projects. That makes for one huge overall object to manipulate.

Please check

http://www.mongodb.org/display/DOCS/Schema+Design

Your options are embedded documents, database references or multiple queries.

Pemba answered 2/3, 2011 at 17:55 Comment(1)
Not sure 'the answer is obvious' applies - clearly it wasn't, otherwise the OP wouldn't have asked. This is a different paradigm from the SQL world and FAQs like how data is stored will come up and are warranted.Hanschen

© 2022 - 2024 — McMap. All rights reserved.