Good reasons NOT to use a relational database?

L

21

140

Can you please point to alternative data storage tools and give good reasons to use them instead of good-old relational databases? In my opinion, most applications rarely use the full power of SQL--it would be interesting to see how to build an SQL-free application.

Laughter answered 1/9, 2008 at 12:5 Comment(0)

K

148

Plain text files in a filesystem

Very simple to create and edit
Easy for users to manipulate with simple tools (i.e. text editors, grep etc)
Efficient storage of binary documents

XML or JSON files on disk

As above, but with a bit more ability to validate the structure.

Spreadsheet / CSV file

Very easy model for business users to understand

Subversion (or similar disk based version control system)

Very good support for versioning of data

Berkeley DB (Basically, a disk based hashtable)

Very simple conceptually (just un-typed key/value)
Quite fast
No administration overhead
Supports transactions I believe

Amazon's Simple DB

Much like Berkeley DB I believe, but hosted

Google's App Engine Datastore

Hosted and highly scalable
Per document key-value storage (i.e. flexible data model)

CouchDB

Document focus
Simple storage of semi-structured / document based data

Native language collections (stored in memory or serialised on disk)

Very tight language integration

Custom (hand-written) storage engine

Potentially very high performance in required uses cases

I can't claim to know anything much about them, but you might also like to look into object database systems.

Kilkenny answered 1/9, 2008 at 12:19 Comment(8)

It would be great if you also explained the drawbacks of each choice, otherwise how is one supposed to choose? Thanks, – Bluecoat 28/9, 2008 at 10:31

Also writing millions of rows into a DB can take a day while appending a million lines of log to a file takes mere minutes. I will never understand why people insist on putting log data into a database. – Exonerate 12/12, 2008 at 21:24

Aaron: I have one reason: SELECT messages FROM log WHERE (date BETWEEN 2009-01-01 AND 2009-03-01) AND type='error' AND system='windows' :) How would you load that from a text file? – Daryldaryle 18/8, 2009 at 10:7

I'm strongly in favor of text files whenever possible. You can't always use them but when you can they are so much easier to diagnose problems in. – Repulse 1/3, 2010 at 19:34

berkeley db definitely has transactions. text files and xml/json files don't, so multithreaded apps can stomp them if you're not careful. CSV files are wonderful for collections of parameters because business users can simply look at them and edit them without extra tools. Text files are great for write-once/read-almost-never applications like logging. To pick an approach you need to figure out what you're trying to accomplish – Lajuanalake 12/7, 2011 at 11:38

BDB used to be used by Subversion, and no longer is, because of the huge number of corruption problems users encountered. – Witmer 10/10, 2012 at 7:5

Text files could be used to log errors and still be queried if they are stored in a directory hierarchy, eg. system\errortype\year\month\day\ – Perambulate 22/5, 2015 at 10:16

@Perambulate which makes it more complicated to look at. For example - if you want to know what happened in a program after something happened you would normally go through 1 log after the timestamp and then see what was done before an error occured - not only an error. You cannot sort them flexible. and always have the problem if your application consists of different threads with locking and so on. – Downpour 7/10, 2015 at 10:26

D

25

Matt Sheppard's answer is great (mod up), but I would take account these factors when thinking about a spindle:

Structure : does it obviously break into pieces, or are you making tradeoffs?
Usage : how will the data be analyzed/retrieved/grokked?
Lifetime : how long is the data useful?
Size : how much data is there?

One particular advantage of CSV files over RDBMSes is that they can be easy to condense and move around to practically any other machine. We do large data transfers, and everything's simple enough we just use one big CSV file, and easy to script using tools like rsync. To reduce repetition on big CSV files, you could use something like YAML. I'm not sure I'd store anything like JSON or XML, unless you had significant relationship requirements.

As far as not-mentioned alternatives, don't discount Hadoop, which is an open source implementation of MapReduce. This should work well if you have a TON of loosely structured data that needs to be analyzed, and you want to be in a scenario where you can just add 10 more machines to handle data processing.

For example, I started trying to analyze performance that was essentially all timing numbers of different functions logged across around 20 machines. After trying to stick everything in a RDBMS, I realized that I really don't need to query the data again once I've aggregated it. And, it's only useful in it's aggregated format to me. So, I keep the log files around, compressed, and then leave the aggregated data in a DB.

Note I'm more used to thinking with "big" sizes.

Disrespectful answered 1/9, 2008 at 14:33 Comment(1)

One danger of CSV files is escaping needs to be done right; its' easy to implement a CSV reader or writer that doesn't really follow the spec since it looks so deceptively simple and there are a few subtleties: en.wikipedia.org/wiki/Comma-separated_values#Specification – Biophysics 24/8, 2009 at 23:54

L

10

The filesystem's prety handy for storing binary data, which never works amazingly well in relational databases.

Liriodendron answered 1/9, 2008 at 12:14 Comment(0)

F

6

Try Prevayler: http://www.prevayler.org/wiki/ Prevayler is alternative to RDBMS. In the site have more info.

Fourinhand answered 1/9, 2008 at 12:20 Comment(0)

B

6

Custom (hand-written) storage engine / Potentially very high performance in required uses cases

http://www.hdfgroup.org/

If you have enormous data sets, instead of rolling your own, you might use HDF, the Hierarchical Data Format.

http://en.wikipedia.org/wiki/Hierarchical_Data_Format:

HDF supports several different data models, including multidimensional arrays, raster images, and tables.

It's also hierarchical like a file system, but the data is stored in one magic binary file.

HDF5 is a suite that makes possible the management of extremely large and complex data collections.

Think petabytes of NASA/JPL remote sensing data.

Biophysics answered 5/9, 2008 at 22:22 Comment(0)

M

6

If you don't need ACID, you probably don't need the overhead of an RDBMS. So, determine whether you need that first. Most of the non-RDBMS answers provided here do not provide ACID.

Misstate answered 28/9, 2008 at 10:22 Comment(4)

Can you give an example why/when ACID is not needed? – Stauffer 5/10, 2013 at 8:0

@vibneiro, if the database has only a single user which does only sequential operations, or the risk of database inconsistensies in case of a power failure is acceptable, or the concept of database transactions does not apply, or there is no need for constraints, cascades, triggers or the like, then a non-ACID non-RDBMS provider (for example a text file with an RDBMS-like API) may suffice. For example, your application may keep a database of historical diagnostic messages for which ACID is completely irrelevant and "log.txt" will suffice. – Misstate 6/10, 2013 at 19:52

Turns out that ACID is not needed in very rare cases. I wonder why then NoSQL databases are so popular? The majority of them don't support full ACIDity. – Stauffer 7/10, 2013 at 20:20

@vibneiro, NoSQL is usually easier, more lightweight, more embeddable, more self-hostable, more intuitive, more flexible, and usually with some ACID. If you don't have relational data, an RDBMS probably isn't what you need. – Misstate 7/10, 2013 at 20:45

G

4

G'day,

One case that I can think of is when the data you are modelling cannot be easily represented in a relational database.

Once such example is the database used by mobile phone operators to monitor and control base stations for mobile telephone networks.

I almost all of these cases, an OO DB is used, either a commercial product or a self-rolled system that allows heirarchies of objects.

I've worked on a 3G monitoring application for a large company who will remain nameless, but whose logo is a red wine stain (-: , and they used such an OO DB to keep track of all the various attributes for individual cells within the network.

Interrogation of such DBs is done using proprietary techniques that are, usually, completely free from SQL.

HTH.

cheers,

Rob

Gallego answered 1/9, 2008 at 12:16 Comment(1)

Why is it that basestation data does not lend itself well to the relational model? – Petiolule 28/9, 2008 at 20:4

D

3

There was a RAD tool called JADE written a few years ago that has a built-in OODBMS. Earlier incarnations of the DB engine also supported Digitalk Smalltalk. If you want to sample application building using a non-RDBMS paradigm this might be a start.

Other OODBMS products include Objectivity, GemStone (You will need to get VisualWorks Smalltalk to run the Smalltalk version but there is also a java version). There were also some open-source research projects in this space - EXODUS and its descendent SHORE come to mind.

Sadly, the concept seemed to die a death, probably due to the lack of a clearly visible standard and relatively poor ad-hoc query capability relative to SQL-based RDMBS systems.

An OODBMS is most suitable for applications with core data structures that are best represented as a graph of interconnected nodes. I used to say that the quintessential OODBMS application was a Multi-User Dungeon (MUD) where rooms would contain players' avatars and other objects.

Defeat answered 1/9, 2008 at 12:6 Comment(3)

It used to be true that you needed a client Smalltalk to use GemStone/S (for desktop apps) but with the web frameworks Aida (aidaweb.si), and Seaside (seaside.st) GemStone/S can be used directly as an application server. See the info on GLASS (seaside.gemstone.com) – Artistry 7/2, 2009 at 20:39

Another reason would be if you care about data quality. In an OODB like Gemstone it is much easier to enforce complex validity rules. – Garlic 18/4, 2011 at 23:1

Ad hoc query capabilities of OODBMS is much better than that of SQL based RDBMS-es – Garlic 18/4, 2011 at 23:2

A

3

Object databases are not relational databases. They can be really handy if you just want to stuff some objects in a database. They also support versioning and modify classes for objects that already exist in the database. db4o is the first one that comes to mind.

Alejandrinaalejandro answered 1/9, 2008 at 12:20 Comment(0)

L

3

In some cases (financial market data and process control for example) you might need to use a real-time database rather than a RDBMS. See wiki link

Leucine answered 1/9, 2008 at 15:21 Comment(0)

Y

1

You can go a long way just using files stored in the file system. RDBMSs are getting better at handling blobs, but this can be a natural way to handle image data and the like, particularly if the queries are simple (enumerating and selecting individual items.)

Other things that don't fit very well in a RDBMS are hierarchical data structures and I'm guessing geospatial data and 3D models aren't that easy to work with either.

Services like Amazon S3 provide simpler storage models (key->value) that don't support SQL. Scalability is the key there.

Excel files can be useful too, particularly if users need to be able to manipulate the data in a familiar environment and building a full application to do that isn't feasible.

Yandell answered 1/9, 2008 at 12:20 Comment(0)

T

1

There are a large number of ways to store data - even "relational databse" covers a range of alternatives from a simple library of code that manipulates a local file (or files) as if it were a relational database on a single user basis, through file based systems than can handle multiple-users to a generous selection of serious "server" based systems.

We use XML files a lot - you get well structured data, nice tools for querying same the ability to do edits if appropriate, something that's human readable and you don't then have to worry about the db engine working (or the workings of the db engine). This works well for stuff that's essentially read only (in our case more often than not generated from a db elsewhere) and also for single user systems where you can just load the data in and save it out as required - but you're creating opportunities for problems if you want multi-user editing - at least of a single file.

For us that's about it - we're either going to use something that will do SQL (MS offer a set of tools that run from a .DLL to do single user stuff all the way through to enterprise server and they all speak the same SQL (with limitations at the lower end)) or we're going to use XML as a format because (for us) the verbosity is seldom an issue.

We don't currently have to manipulate binary data in our apps so that question doesn't arise.

Murph

Tieback answered 1/9, 2008 at 12:31 Comment(0)

E

1

One might want to consider the use of an LDAP server in the place of a traditional SQL database if the application data is heavily key/value oriented and hierarchical in nature.

Efficient answered 1/9, 2008 at 15:49 Comment(0)

H

1

BTree files are often much faster than relational databases. SQLite contains within it a BTree library which is in the public domain (as in genuinely 'public domain', not using the term loosely).

Frankly though, if I wanted a multi-user system I would need a lot of persuading not to use a decent server relational database.

Horowitz answered 1/9, 2008 at 22:18 Comment(1)

BTrees are the basic implementation of normal indexes. Oracle supports Index-Organized tables that are just a table implemented as as index. Their are faster to read, slower to write and use a B-tree . See: <oracle.com/technology/products/oracle9i/datasheets/iots/…> – Sidnee 28/9, 2008 at 10:34

B

1

Full-text databases, which can be queried with proximity operators such as "within 10 words of," etc.

Relational databases are an ideal business tool for many purposes - easy enough to understand and design, fast enough, adequate even when they aren't designed and optimized by a genius who could "use the full power," etc.

But some business purposes require full-text indexing, which relational engines either don't provide or tack on as an afterthought. In particular, the legal and medical fields have large swaths of unstructured text to store and wade through.

Banville answered 15/9, 2008 at 17:51 Comment(0)

G

1

Also: * Embedded scenarios - Where usually it is required to use something smaller then a full fledged RDBMS. Db4o is an ODB that can be easily used in such case. * Rapid or proof-of-concept development - where you wish to focus on the business and not worry about persistence layer

Gloriane answered 28/9, 2008 at 10:17 Comment(0)

S

1

K.I.S.S: Keep It Small and Simple

Sidnee answered 28/9, 2008 at 10:29 Comment(1)

That's the polite version ... I've more often heard "Keep it simple, stupid" ... or, gulp, maybe that's just what people tell me! :-( – Kistler 22/9, 2010 at 18:48

A

1

CAP theorem explains it succinctly. SQL mainly provides "Strong Consistency: all clients see the same view, even in presence of updates".

Alejandrinaalejandro answered 1/3, 2010 at 19:8 Comment(0)

F

1

I would offer RDBMS :) If you do not wont to have troubles with set up/administration go for SQLite. Built in RDBMS with full SQL support. It even allows you to store any type of data in any column.

Main advantage against for example log file: If you have huge one, how are you going to search in it? With SQL engine you just create index and speed up operation dramatically.

About full text search: SQLite has modules for full text search too..

Just enjoy nice standard interface to your data :)

Feil answered 21/3, 2011 at 21:51 Comment(0)

E

0

One good reason not to use a relational database would be when you have a massive data set and want to do massively parallel and distributed processing on the data. The Google web index would be a perfect example of such a case.

Hadoop also has an implementation of the Google File System called the Hadoop Distributed File System.

Esra answered 1/9, 2008 at 22:10 Comment(0)

F

0

I would strongly recommend Lua as an alternative to SQLite-kind of data storage.

Because:

The language was designed as a data description language to begin with
The syntax is human readable (XML is not)
One can compile Lua chunks to binary, for added performance

This is the "native language collection" option of the accepted answer. If you're using C/C++ as the application level, it is perfectly reasonable to throw in the Lua engine (100kB of binary) just for the sake of reading configs/data or writing them out.

Fraase answered 28/9, 2008 at 10:43 Comment(2)

Lua is a a programming language. This suggestion could be generalized to suggest any persistence/serialization features of any programming language (for example pickle/shelve in Python, or JSON/YAML for Perl et al, and so on). This doesn't address concurrent access and ACID guarantees at all. – Sunderland 27/1, 2010 at 6:29

You're right. What was missing from my entry was the implied read-only nature of such usage. In such scenario I hold to my text. For read-write use of Lua in this way makes absolutely no sense. Many things, s.a. filesystem metadata are mostly read-only so such an approach does not mean complete ro requirement. – Fraase 27/1, 2010 at 15:18

Recommended topics

Hot tags