Would you store binary data in database or in file system? [closed]
Asked Answered
H

11

47

This is a question which has been asked before (large-text-and-images-in-sql) but mainly for data which will be changed. In my case the data will be stored and never changed. Just seems sensible to keep everything together.

Are there any reasons why I should not store static binary data in a database?

Assuming it is a sensible thing to do, are there any advantages to storing such data in separate tables? (You might begin to realise now that I'm not a DB expert...)

Clarify: There will probably be no more than 10-20 users but these will be in the US and in the UK. The binary data will have to be transfered in any case.

Hendrika answered 19/3, 2009 at 14:47 Comment(0)
I
36

The advantage of storing data in the DB is taking advantage of DB security mechanisms and reducing maintanence cost (backups, ...). The disadvantage of it is increasing DB load and consuming connections (which might be expensive for per-connection licensed database servers). If you are using SQL Server 2008, FILESTREAM might be a nice alternative.

By the way, for Web apps (or any other apps that might need streaming the data), it's usually more sensible to store data outside DB.

Inharmonious answered 19/3, 2009 at 14:50 Comment(2)
I'm not sure how it reduces maintenence/backup costs. If anything it increases them because generally backing up a database is more costly and demanding than backing up a filesystem. Can you elaborate?Peen
@Peen My point is you don't need to backup files separately and manually keep them in sync to ensure integrity with the data contained in the database backup. It can be more straightforward and cheaper one way or another depending on the circumstances.Inharmonious
D
12

All this talk about doing a "select * from table" causing huge memory and/or bandwidth issues when the table has a LOB in it is a non-issue. All that is returned is a pointer to the LOB in question. Not enough reputation to put the comment in-context, but people looking at this should know it's NOT an issue.

Dominicdominica answered 8/6, 2011 at 17:44 Comment(2)
@Matthew I think he meant Large OBject.Proteolysis
if you're using an ORM, the ORM might pull all of the binary data. EF for example would, requiring you to either keep the blob in a separate lazily loaded table, or else pull only the columns you need (which is a little clumsy in EF).Kinesiology
M
9

The biggest dissadvantage if you are storing BLOBS is memory consumption. Can you imagine what select * from x would do for thousands of records with a 45k image in each?

As Mehrdad said there are also advantages. So if you decide to go with that approach you should try to design your database so that most queries return less results with BLOB data in them. Maybe for example make one to one relationships for this purpose.

Mccormack answered 19/3, 2009 at 14:57 Comment(4)
+1 Good point - perhaps a good reason to put blobs in separate table and pick up via id?Hendrika
To be honest I've always been afraid to use BLOB, because I suck at sql. But if I have to I'd probably make a separate one-to-one relationship for each blob. Pretty much using it as I use references to files. Except these would be stored in db. Note: please don't do this in webapps.Mccormack
IMHO, it is not a valid argument. Doing select * from x is a bad idea in most circumstances, except the one where you need to use every column of a table in your app. Putting blobs in a separate table is even worse, because it would require joins and complicate the requests.Credit
@Arseni Mourzenko: IMHO, putting BLOBS in a separate 1-to-1 Table does not significantly "complicate the requests" relative to the norm. Most non-trivial DB's (even without BLOBS) have lots of Child Tables that need to be Joined even for simple CRUD ops not to mention those Tables are usually 1-to-many. Also, for requests not involving BLOBS (which would prolly be pretty common i.e. to get just the metadata for a search results grid), having the BLOBS in a separate Table could significantly reduce the # of Data Pages the DB Server would have to read through.Cryptocrystalline
W
7

I'm familiar with a fairly good-sized OSS project that made the decision at its inception to store images in the MySQL database, and it's proven to be among the top 3 bad ideas they have been coping with ever since. (Exacerbated by the fact the "refactor mercilessly" is anathema, but that's another story.)

Among the serious problems this has caused:

  1. Exceeding maximum efficient database size (mysql). (The total space required for images exceeds all others by a at least 2 orders of magnitude).

  2. Image files lose their "fileness". No dates sizes etc. unless stored (redundantly) as dates (which require code for management).

  3. Arbitrary byte sequences don't process nicely all the time, for either storage or manipulation.

  4. "We'll never need to access the images externally" is a dangerous assumption.

  5. Fragility. Because the whole arrangement is unnatural and touchy, and you don't know where it will bite next (contributing to the anti-refactor mentality).

The benefits? None that I can think of, except it might have been the path of least resistance at the time.

Wikiup answered 19/3, 2009 at 16:26 Comment(3)
I'm assuming the bad decision was to store blobs. Correct?Hendrika
One substantial benefit is consistency of data: with proper keys the "files" cannot be deleted without the meta data and vice-versa. For disk files there are no such constraints and adding/deleting the files and their meta data is a separate application (or function) that must be designed, implemented and USED.Cider
Yes, you still need to write a application with proper validation - but that's true either way. I wouldn't call the difference "substantial". What was substantial was all the extra work it took to get at the images with other applications and utilities when they are only available with database calls, and most image handling software doesn't come with database persistence built in. So just to view the image required extracting it with one app, viewing it with another, and then making sure it got put back in the proper place when done. And forget browsing the images in Explorer, say.Wikiup
R
7

Addressing the issue from a principles point of view, a relational database is (mainly) there for storing structured data. If you cannot make a query condition or join on a data element it probably doesn't belong in the database. I don't see an image BLOB being used in a WHERE clause, so I'd say keep it outside the database. A CLOB on the other hand can be used in queries.

Rudolphrudwik answered 10/6, 2010 at 12:54 Comment(2)
We'll probably won't be using the phone number in a WHERE clause either, because it's not frequent at all to search anything by phone number (unless you're working on a reverse-lookup system). That said, we do store phone numbers in the DB, not in external files even though it's seldom used as a join or filter condition. What I mean is that that reason is not enough for discarding the possibility to save an image inside a relational DB.Bugeye
But you can make a query condition on the phone number, or use it for a join, something you cannot reasonably do with a BLOB column.Rudolphrudwik
A
5

I think this depends on the application your building. If you're building a CMS system, and the usage of the data is going be to display images within a web browser, it might make sense to save the images to disk as opposed to being put into the database. Although honestly I would do both, which could allow adding a server to a farm without having to copy files all over the place.

Another use case might be a complex object, such as a workflow, or even a business object with lots of interdependancies. You could serialize both of these into a binary or text based format, and save them in the DB. Then you get the benefit of the DB: ATOMIC, Backups, etc...

I don't think people should be using select * queries in the first place. What you do is provide two ways to get the data, One methods returns the summary information, the second would return the blob. I can't imagine why you would need to return thousands of images all at once.

Anglicist answered 19/3, 2009 at 15:9 Comment(6)
+1 For the ideas. About the select * from part. You don't necesseraly have to write that query by hand. Some ORMs use this kinds of queries by default, so if someone is not careful... ouch.Mccormack
Heh, you know which ORM uses those queries? I want to stay away from them. nHibernate I know does notAnglicist
I've seen in some php framework, can't recall. But since they are in a web app, they probably thought select * is less data over the wire than select foo, bar, sausage. I bet they never thought about BLOBS.Mccormack
Or about the fragility of select *...Anglicist
It doesn't look very fragile from a php standpoint tho :).Mccormack
It would be fragile if your code was accessing columns by index rather than name. If you add a column at index 2, all columns after the new one would have a different index, and things can get ugly.Fibrovascular
F
3

Whoever had the idea of storing an image (or other binary document) in a database is not someone I'm very happy with. Databases are meant for storage of [mostly?] INDEXABLE, DISCRETE data. Not BLOBs of meaningless binary data. If you've worked with BLOBs for binary data first-hand, you already know this.

You should store a reference to the file in the filesystem. Best practice of which is a filename, not an absolute (or even relative) path.

Fernyak answered 2/5, 2014 at 2:35 Comment(3)
As far as "SELECT *" goes, I think in most circumstances its reasonable. I built an ORM that uses it all over, but you can override that. And if you're really concerned about performance, you can bypass the ORM completely and use the query builder the ORM uses behind the scenes. The point is that this conversation has nothing to do with "SELECT * ...". It has to do with sound database design.Fernyak
How do you retrieve file if only file name is stored not the path? Would you have a folder where all files would be put? What if there are millions of files in my db?Renter
In the config somewhere in the application, you should store the path to the directory where the files are stored. If you are worried about having too may files in the same directory, build the path dynamically. Usually you can use an ID for this, such as /path/to/files/{ID here}/filename.ext. You only need to store the filename.Fernyak
E
2

Isn't this exactly what LOBs or CLOBs or .... were designed for?

We used CLOBs to store large encryptions of credit card card transactions for a major airline system.

Memory consumption is your greatest culprit though.

HTH

cheers,

Elyssa answered 19/3, 2009 at 15:1 Comment(0)
M
2

We store attachments in our system, and you cannot change an attachment, so I think we're on the same page w/data that "will be stored and never changed." We specifically decided not to store it in the database. We did this for two reasons, simplicity, and backup/recovery time.

Simplicity first: In our case these attachments are uploaded from the end-user's browser, and it's simpler to just write them to a directory (on the DB server) than it is to then stream them down the SQL pipe. There is a record of them in the DB, but the DB just contains meta-information about the attachment, and the name of the file on disk (a guid in our case)

On the backup/recovery side: These blobs will likely become one of the largest pieces of your database. Whenever you run a full backup you'll be copying these bits over and over, even though you know then can never change. To us it just seemed much simpler to have (much) smaller backups, and do an xcopy of the attachment directory to a secondary server as the backup.

Medina answered 19/3, 2009 at 15:55 Comment(0)
L
1

Some database(e.g. Postgresql) automatically compress fields, perhaps it is faster when reading them directly from db. And also, the program can read all the fields and image in one swoop.

Louanne answered 19/3, 2009 at 15:14 Comment(2)
Yes, if I ever used blobs it'd be postgres. You save in bandwith. But the data has to come uncompressed in the application's process at some point.Mccormack
Many blobs (images, mp3s, etc.) are essentially precompressed anyway.Wikiup
J
1

The performance issue here as been address above, so I won't repeat it. But I think a good tip if you are storing things that will be streamed out a lot (such as images/documents on a web-site) is to build in a caching system.

By this I mean store all the data in your database, but when someone requests that file, check if it exists on disk (based on a known filename, in a temp folder), if not, grab it from the DB and write it to the folder, and then stream that to the user. For the next request to the same file, since it exists on disk, it can be served from there without hitting the DB. But if you need to delete these files (or your web-server goes kapput!), it doesn't matter as they will be rebuilt again from the DB as people request them. This should be much quicker than serving each request for the same file from the DB.

Japonica answered 19/3, 2009 at 16:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.