How to stop thinking "relationally"

Asked 25/6, 2009 at 13:12 Answered 25/6, 2009 at 18:36

Solved ruby database-design couchdb document-oriented-db couchpotato

At work, we recently started a project using CouchDB (a document-oriented database). I've been having a hard time un-learning all of my relational db knowledge.

I was wondering how some of you overcame this obstacle? How did you stop thinking relationally and start think documentally (I apologise for making up that word).

Any suggestions? Helpful hints?

Edit: If it makes any difference, we're using Ruby & CouchPotato to connect to the database.

Edit 2: SO was hassling me to accept an answer. I chose the one that helped me learn the most, I think. However, there's no real "correct" answer, I suppose.

Badmouth answered 25/6, 2009 at 13:12 Comment(12)

It's unlikely that you ever learned relational DB knowledge. It's one of those topics that has a lot of misinformation about that gets passed of as legitimate. Ever read a Chris Date book? if you had, you probably wouldn't be trying to use CouchDB. You'd know better. – Bordelon 25/6, 2009 at 13:26

That said, just imagine you have a single table named "documents" with as many automatically generated columns as you need, and I think you have a good approximation of what this is about: A domain specific DB (Think blogs) – Bordelon 25/6, 2009 at 13:32

@Brenton - Hey Hey! Get your facts right. That's C J Date to you. :) – Tinner 25/6, 2009 at 13:40

That's Breton to you! (just had a look at the couchDB getting started thing. Looks like just the thing I need for a personal project. If only I could get it running on my webhost. (I don't know better.)) – Bordelon 25/6, 2009 at 13:48

Yeah, I can't get the image of a single, huge, unmaintainable table out of my head. Maybe that's the problem. I'm not sold on this whole thing, but I've been overruled by senior devs. – Badmouth 25/6, 2009 at 13:57

Map/reduce with javascript functions isn't so bad. As long as your data set remains small, this is cake. – Bordelon 25/6, 2009 at 14:19

Matt Grande your mental image is wrong. It would only be unmaintainable if you were doing such a thing in a relational database, or if you were trying to use a document database for an application which is highly relational (a recipe for problems). – Curbing 25/6, 2009 at 14:40

@Breton, so stuff like CouchDB mainly makes sense only when working with small datasets?? That sucks. I thought the whole idea of such storage engines was to make it easy to massively scale and yet easily replicate (in a lazy manner). – Daleth 12/4, 2010 at 18:59

@john I do not have enough expertise to answer that particular question, actually. I imagine you could massively scale (in a map reduce sense). It seems like a good model for parallelism, as long as you know what you're doing when designing your database, and what the consequences will be in terms of algorithmic complexity. It is just that the relational model has more opportunities for mathematical efficiencies, but again, only if you know what you're doing. There's plenty of room in any case to be a complete retard, and when it comes to large data-sets, most programmers are, including me. – Bordelon 14/4, 2010 at 23:49

Should be community wiki – Polonaise 19/8, 2010 at 1:44

@Bordelon map/reduce is great for huge amounts of data. How do you think Google does most of their large scale processing – Polonaise 19/8, 2010 at 1:46

@Wahnfrieden which strawman version of me are you arguing against? – Bordelon 19/8, 2010 at 2:26

I think, after perusing about on a couple of pages on this subject, it all depends upon the types of data you are dealing with.

RDBMSes represent a top-down approach, where you, the database designer, assert the structure of all data that will exist in the database. You define that a Person has a First,Last,Middle Name and a Home Address, etc. You can enforce this using a RDBMS. If you don't have a column for a Person's HomePlanet, tough luck wanna-be-Person that has a different HomePlanet than Earth; you'll have to add a column in at a later date or the data can't be stored in the RDBMS. Most programmers make assumptions like this in their apps anyway, so this isn't a dumb thing to assume and enforce. Defining things can be good. But if you need to log additional attributes in the future, you'll have to add them in. The relation model assumes that your data attributes won't change much.

"Cloud" type databases using something like MapReduce, in your case CouchDB, do not make the above assumption, and instead look at data from the bottom-up. Data is input in documents, which could have any number of varying attributes. It assumes that your data, by its very definition, is diverse in the types of attributes it could have. It says, "I just know that I have this document in database Person that has a HomePlanet attribute of "Eternium" and a FirstName of "Lord Nibbler" but no LastName." This model fits webpages: all webpages are a document, but the actual contents/tags/keys of the document vary soo widely that you can't fit them into the rigid structure that the DBMS pontificates from upon high. This is why Google thinks the MapReduce model roxors soxors, because Google's data set is so diverse it needs to build in for ambiguity from the get-go, and due to the massive data sets be able to utilize parallel processing (which MapReduce makes trivial). The document-database model assumes that your data's attributes may/will change a lot or be very diverse with "gaps" and lots of sparsely populated columns that one might find if the data was stored in a relational database. While you could use an RDBMS to store data like this, it would get ugly really fast.

To answer your question then: you can't think "relationally" at all when looking at a database that uses the MapReduce paradigm. Because, it doesn't actually have an enforced relation. It's a conceptual hump you'll just have to get over.

A good article I ran into that compares and contrasts the two databases pretty well is MapReduce: A Major Step Back, which argues that MapReduce paradigm databases are a technological step backwards, and are inferior to RDBMSes. I have to disagree with the thesis of the author and would submit that the database designer would simply have to select the right one for his/her situation.

Resistive answered 25/6, 2009 at 15:14 Comment(2)

A lot of the criticisms that article directs toward MapReduce-based databases seem to be addressed in CouchDB. CouchDB uses B-tree indexes, supports views (in fact, CouchDB appears to have more of an emphasis on views than MySQL does), allows updates, makes replication easy, etc. – Heartland 26/6, 2009 at 5:37

@Chuck: It has more emphasis on views because there are no queries, only views. – Badmouth 23/7, 2009 at 13:52

It's all about the data. If you have data which makes most sense relationally, a document store may not be useful. A typical document based system is a search server, you have a huge data set and want to find a specific item/document, the document is static, or versioned.

In an archive type situation, the documents might literally be documents, that don't change and have very flexible structures. It doesn't make sense to store their meta data in a relational databases, since they are all very different so very few documents may share those tags. Document based systems don't store null values.

Non-relational/document-like data makes sense when denormalized. It doesn't change much or you don't care as much about consistency.

If your use case fits a relational model well then it's probably not worth squeezing it into a document model.

Here's a good article about non relational databases.

Another way of thinking about it is, a document is a row. Everything about a document is in that row and it is specific to that document. Rows are easy to split on, so scaling is easier.

Aitch answered 25/6, 2009 at 14:41 Comment(0)

In CouchDB, like Lotus Notes, you really shouldn't think about a Document as being analogous to a row.

Instead, a Document is a relation (table).

Each document has a number of rows--the field values:

ValueID(PK)  Document ID(FK)   Field Name        Field Value
========================================================
92834756293  MyDocument        First Name        Richard
92834756294  MyDocument        States Lived In   TX
92834756295  MyDocument        States Lived In   KY

Each View is a cross-tab query that selects across a massive UNION ALL's of every Document.

So, it's still relational, but not in the most intuitive sense, and not in the sense that matters most: good data management practices.

Audly answered 25/6, 2009 at 14:53 Comment(0)

Document-oriented databases do not reject the concept of relations, they just sometimes let applications dereference the links (CouchDB) or even have direct support for relations between documents (MongoDB). What's more important is that DODBs are schema-less. In table-based storages this property can be achieved with significant overhead (see answer by richardtallent), but here it's done more efficiently. What we really should learn when switching from a RDBMS to a DODB is to forget about tables and to start thinking about data. That's what sheepsimulator calls the "bottom-up" approach. It's an ever-evolving schema, not a predefined Procrustean bed. Of course this does not mean that schemata should be completely abandoned in any form. Your application must interpret the data, somehow constrain its form -- this can be done by organizing documents into collections, by making models with validation methods -- but this is now the application's job.

Waterhouse answered 25/6, 2009 at 18:36 Comment(0)

may be you should read this http://books.couchdb.org/relax/getting-started

i myself just heard it and it is interesting but have no idea how to implemented that in the real world application ;)

Martellato answered 25/6, 2009 at 13:21 Comment(1)

after reading that article i found that every data is a document. has no relationship like master detail ... each data is independent document. for example a Blog post has tags, contents, author and comments. in relationship database we define some tables like tags, posts, comments and authors and each table related with one another. posts has many tags. authors has many posts etc. in couchdb .. you have no posts,tags etc. all in one. cmmiiw – Martellato 25/6, 2009 at 15:24

One thing you can try is getting a copy of firefox and firebug, and playing with the map and reduce functions in javascript. they're actually quite cool and fun, and appear to be the basis of how to get things done in CouchDB

here's Joel's little article on the subject : http://www.joelonsoftware.com/items/2006/08/01.html

Bordelon answered 25/6, 2009 at 14:36 Comment(3)

i think joel talking about closure (in groovy term) or blocks (in ruby). has nothing to do with couchDB – Martellato 25/6, 2009 at 15:22

Then I think you have a big fat case of TLDR syndrome. The article is about Map/Reduce – Bordelon 25/6, 2009 at 16:11

Which I think, you'll find it is very relevant. – Bordelon 25/6, 2009 at 16:16

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags