Storing Images in DB - Yea or Nay?

A

56

415

So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.

What do you think are the pros/cons?

Apophyge answered 6/8, 2008 at 17:38 Comment(1)

Well, you can do both with a transactional disk cache. – Ananthous 15/8, 2011 at 21:16

A

350

I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.

There are a couple of issues:

database storage is usually more expensive than file system storage
you can super-accelerate file system access with standard off the shelf products
- for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
things like web servers, etc, need no special coding or processing to access images in the file system
databases win out where transactional integrity between the image and metadata are important.
- it is more complex to manage integrity between db metadata and file system data
- it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem

Atlantean answered 6/8, 2008 at 17:38 Comment(14)

what off the shelf products are available for "super-accelerating" the file system? – Spindling 4/10, 2008 at 10:53

easy - you just do mke2fs --go-faster-stripes – Overcautious 28/11, 2008 at 6:20

While I only manage 3TB of files, I definitely agree. Databases are for structured data, not blobs. – Wonted 22/3, 2009 at 19:31

@derobert: quite so, if you will never use a data element in a query, as a condition or for a join, it probably doesn't belong in the database. Then again, if you have a nice database function to query images for likeness... – Acoustic 18/5, 2009 at 14:34

its also most useful storing images on the file system. just think if a client phones up asking they cant view an image but they have the image id. much faster locating and viewing the image on the file system rather than from the db (there may be problems with the code). – Liebfraumilch 28/7, 2009 at 8:16

what off the shelf products are available for "super-accelerating" the file system? – Tolidine 31/7, 2009 at 15:16

Re: "super-accelerating" products: Most web servers can now take advantage of the sendfile() system call to deliver static files asynchronously to the client. It offloads to the operating system the task of moving the file from disk to the network interface. The OS can do this much more efficiently, operating in kernel space. This, to me, seems like a big win for file system vs. db for storing/serving images. – Mangle 20/11, 2010 at 17:7

re "super-accelerating": I'm thinking of products such as those from isilon, emc, netapp, etc, that can be configured to cluster, cache, etc data stored in file systems (in our case, NFS). Here's a presentation I made that discusses some of the issues. It was at a perforce conference so it doesn't go into detail about the database side, but it covers the gist of what we do: maillist.perforce.com/perforce/conferences/us/2009/… – Atlantean 22/11, 2010 at 21:17

I work with clients (of the ImageResizing.Net library) that store images both ways, and the filesystem is much more scalable and performant. But cloud storage is an much better option for scalability. Also, on Windows, NTFS starts to crawl after 100,000 files, and ASP.NET doesn't like SANs. I've helped get customers with upwards of 5 million images working on Windows, but it can be painful. – Ananthous 15/8, 2011 at 21:9

@Computer Linguist: When NTFS slows down, defragment file 0, $MFT (the master file table). – Schulze 13/9, 2011 at 22:55

@Mark Harrison, Does the retrieval performance of the images in the two cases, also depends on the size of the images ? For e.g . If its profile pic of users then is it may be recommended to store in DB ? – Windy 11/11, 2011 at 17:53

@Marcos, yes, you're right. In that case, the convenience of keeping the small image in the same place as the other data about the user outweights the other factors. Especially since the image is probably being accessed at the same time as the other data about the user. – Atlantean 11/11, 2011 at 18:20

Thanks a lot Mark! Is the performance also improved for small sized images(75*75px) stored in DB, relative to file system. I heard some time back that if documents size if below 1 MB then it may be better to store in DB as compared to FileSystem. Is that true ? – Windy 11/11, 2011 at 18:36

I think if the images are small enough then the time to serve the data becomes negligible and other factors (such as the convenience of keeping the image data as part of the row) become more important. Of course as in all performance-related questions, it's often a case of experimentation in the particular application/environment to see what works best, but I believe you're thinking along the right track. Good Luck!! – Atlantean 11/11, 2011 at 23:36

G

140

As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.

You are storing images that are changing dynamically, say invoices and you wanted to get an invoice as it was on 1 Jan 2007?
The government wants you to maintain 6 years of history
Images stored in the database do not require a different backup strategy. Images stored on filesystem do
It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images

On the other hand there are problems associated

Require additional code to extract and stream the images
Latency may be slower than direct file access
Heavier load on the database server

Genvieve answered 6/8, 2008 at 17:38 Comment(6)

Not having a seperate backup strategy can be a big deal when you are writing applications that are installed on premise (like SharePoint). When you create a SharePoint backup everything is in the DB which makes it very easy. – Angulation 2/10, 2008 at 23:40

Security by obscurity is not really an access control strategy! – Ratal 9/10, 2008 at 10:46

I don't think he's advocating security by obscurity - he's saying that putting images in the DB adds another layer of security. (I think... @Conrad, don't want to put words in your mouth) – Allergen 7/10, 2010 at 9:6

I chose storing images in the database because of the single backup advantage (or more generally speaking, having all data in one place), but the problems you mention are true as well, which is why I cache the images on the filesystem. It's the best of both worlds, and I'm surprised none of the top answers here mention it. – Tashia 1/5, 2011 at 21:4

Are you, by chance, using the ImageResizing.Net library to handle your SQL->disk image caching? It's the most advanced, scalable, and robust disk cache you can get... – Ananthous 15/8, 2011 at 21:13

@Conrad: What about small sized images ? I believe retrieval performance of the images in the two cases, also depends on the size of the images right? For e.g . If its profile pic of users then will it be recommended to store in DB ? – Windy 11/11, 2011 at 18:16

B

99

File store. Facebook engineers had a great talk about it. One take away was to know the practical limit of files in a directory.

Needle in a Haystack: Efficient Storage of Billions of Photos

Barracuda answered 6/8, 2008 at 17:38 Comment(1)

ext3's dir_index helps a lot. – Chaldron 5/5, 2011 at 8:41

P

56

This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.

FileStream solves most of the problems around storing the files in the DB:

The Blobs are actually stored as files in a folder.
The Blobs can be accessed using either a database connection or over the filesystem.
Backups are integrated.
Migration "just works".

However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.

From the MSDN Article:

Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

Propriety answered 6/8, 2008 at 17:38 Comment(3)

+1 for FileStream. It actually stores the blobs as files on disk, but manages them transactionally. – Propriety 26/7, 2011 at 21:30

Also, SQL server allows FileStream blobs to be access directly off of the disk, so that you can avoid tying up the DB connection – Propriety 26/7, 2011 at 21:30

Still, added latency between the DB and the web server... And the web server will have to load it into memory to stream it to the client instead of being able to stream it from disk, unless you're using disk caching. – Ananthous 15/8, 2011 at 21:14

H

39

File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.

Humpbacked answered 6/8, 2008 at 17:38 Comment(0)

B

35

In my experience, sometimes the simplest solution is to name the images according to the primary key. So it's easy to find the image that belongs to a particular record, and vice versa. But at the same time you're not storing anything about the image in the database.

Bonis answered 6/8, 2008 at 17:38 Comment(4)

Very nice indeed. Your users can now easily increment your filename to access other files... – Archdeaconry 24/10, 2010 at 17:53

@Marijn: That's only if you expose the images to the world. – Chaldron 26/11, 2010 at 14:15

We did something very similar with our imaged documents (our primary key is a composite key of three items.), but we added the date and time the document was scanned so that we can have multiple versions in the same directory. – Revamp 4/8, 2011 at 13:55

@Osewa, How's that? Yes, to directly access the file, the end user would need access to the folder. You could have a process to serve the file via FTP based upon request, and the security would be on par with SQL server. – Revamp 4/8, 2011 at 13:56

O

31

The trick here is to not become a zealot.

One thing to note here is that no one in the pro file system camp has listed a particular file system. Does this mean that everything from FAT16 to ZFS handily beats every database?

No.

The truth is that many databases beat many files systems, even when we're only talking about raw speed.

The correct course of action is to make the right decision for your precise scenario, and to do that, you'll need some numbers and some use case estimates.

Oysterman answered 6/8, 2008 at 17:38 Comment(1)

I don't see anyone claiming that a filesystem is faster than a DB 100% of the time (read Mark Harrison's answer). That's a bit of a strawman. There are probably situations in which it's preferable not to wear your seatbelt, but generally speaking, wearing a seatbelt is a good idea. – Curiosa 8/4, 2009 at 16:56

S

30

In places where you MUST guarantee referential integrity and ACID compliance, storing images in the database is required.

You cannot transactionaly guarantee that the image and the meta-data about that image stored in the database refer to the same file. In other words, it is impossible to guarantee that the file on the filesystem is only ever altered at the same time and in the same transaction as the metadata.

Shaduf answered 6/8, 2008 at 17:38 Comment(2)

Actually, no, you can. As long as image files are never deleted, changed or over-written once created, all image files are synced before attempting to commit transactions, there is no filesystem corruption, you can be sure that image files and metadata are in sync. For some applications, those are too many ifs, I guess. – Chaldron 5/11, 2010 at 1:36

I would go even further and say that with a Journaling file system and some additional program logic, the ACID compliance can be achieved. The steps would be write the db record, write the file. If the file commits, commit the db transaction. – Revamp 4/8, 2011 at 14:0

M

28

As others have said SQL 2008 comes with a Filestream type that allows you to store a filename or identifier as a pointer in the db and automatically stores the image on your filesystem which is a great scenario.

If you're on an older database, then I'd say that if you're storing it as blob data, then you're really not going to get anything out of the database in the way of searching features, so it's probably best to store an address on a filesystem, and store the image that way.

That way you also save space on your filesystem, as you are only going to save the exact amount of space, or even compacted space on the filesystem.

Also, you could decide to save with some structure or elements that allow you to browse the raw images in your filesystem without any db hits, or transfer the files in bulk to another system, hard drive, S3 or another scenario - updating the location in your program, but keep the structure, again without much of a hit trying to bring the images out of your db when trying to increase storage.

Probably, it would also allow you to throw some caching element, based on commonly hit image urls into your web engine/program, so you're saving yourself there as well.

Microsporophyll answered 6/8, 2008 at 17:38 Comment(0)

H

27

Small static images (not more than a couple of megs) that are not frequently edited, should be stored in the database. This method has several benefits including easier portability (images are transferred with the database), easier backup/restore (images are backed up with the database) and better scalability (a file system folder with thousands of little thumbnail files sounds like a scalability nightmare to me).

Serving up images from a database is easy, just implement an http handler that serves the byte array returned from the DB server as a binary stream.

Hippolytus answered 6/8, 2008 at 17:38 Comment(1)

I would argue that the database is better for files that are frequently edited, since consistency can be a problem in that case. – Chaldron 5/11, 2010 at 1:40

M

26

Here's an interesting white paper on the topic.

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem

The answer is "It depends." Certainly it would depend upon the database server and its approach to blob storage. It also depends on the type of data being stored in blobs, as well as how that data is to be accessed.

Smaller sized files can be efficiently stored and delivered using the database as the storage mechanism. Larger files would probably be best stored using the file system, especially if they will be modified/updated often. (blob fragmentation becomes an issue in regards to performance.)

Here's an additional point to keep in mind. One of the reasons supporting the use of a database to store the blobs is ACID compliance. However, the approach that the testers used in the white paper, (Bulk Logged option of SQL Server,) which doubled SQL Server throughput, effectively changed the 'D' in ACID to a 'd,' as the blob data was not logged with the initial writes for the transaction. Therefore, if full ACID compliance is an important requirement for your system, halve the SQL Server throughput figures for database writes when comparing file I/O to database blob I/O.

Mooned answered 6/8, 2008 at 17:38 Comment(0)

D

25

One thing that I haven't seen anyone mention yet but is definitely worth noting is that there are issues associated with storing large amounts of images in most filesystems too. For example if you take the approach mentioned above and name each image file after the primary key, on most filesystems you will run into issues if you try to put all of the images in one big directory once you reach a very large number of images (e.g. in the hundreds of thousands or millions).

Once common solution to this is to hash them out into a balanced tree of subdirectories.

Discussant answered 6/8, 2008 at 17:38 Comment(11)

You would think so, but the issues are actually minor; I have an app with millions of files in a one directory, accessed by hundreds of users, without a problem. It's not smart, but it works. The biggest issue is if you use Explorer to browse the directory, you watch a flashlight forever. – Fryer 5/10, 2008 at 13:7

If you were worried about this, it would be easy to use a system similar to DNS, where the root directory has a separate directory underneath it for the first character of the key. To balance disk space (or even load balancing), one can use mount points or links to spread them out. – Subcontinent 26/10, 2008 at 22:44

It's better to use a filesystem that has no problem with large directories – Chaldron 29/10, 2008 at 3:46

I had an app with millions of files in one directory (server running RHEL 4) - to even list the directory contents (piping to a file) took days and created an output file 100's of MB in size. Now they are in a database I have a single file that I can move or backup quite easily. – Kirov 17/6, 2009 at 15:24

@Seun Osewa : every file system has limitations ... and if you know of one that has no problems storing millions of entries in the same directory, please let me know ! – Grace 4/11, 2010 at 12:51

ext3 with the dir_index flag on handles large directories just fine. I have a directory with 288,000 large images. ls > /dev/null takes less than 2 seconds. Ext3 with dir_index stores the directory information in a btree. – Chaldron 5/11, 2010 at 1:38

@Richard: Is your single file db-with-images backup less than "100's of mb" in size? Does it take less time to backup than the directory of images? – Chaldron 5/11, 2010 at 1:39

@Seun Osewa : the database is up to 28GB now, with 5.4 M records. I ended up having to partition the database table so I have several files to back up that are approx 5GB in size.Moving the individual images onto Amazon S3 now so I only have to store the filename in the DB (and Amazon can do the backups) – Kirov 12/11, 2010 at 7:41

@Richard: My image directory is 19GB on a single disk and I have no issues at all. I think your experience proves that the file approach was better. With files, you can do differential backups using rsync, which only copies new files or files that changed since the last backup. Works for me; 19gb and no issues. No partitioning, and no need for Amazon S3. You should go back to it. – Chaldron 12/11, 2010 at 22:49

@Seun Osewa - Although I agree with you on use of the FileSystem there can be issues with rsync if data gets corrupted. Ma.gnolia (an online bookmark site/tool) had a devastating blow with rsync vimeo.com/3205188 in their case it killed their live and backup DB's. Likely not so much an issue with images that don't change much (except for add/delete) but a not-so subtle reminder that one should have multiple backups ;-) – Lade 17/12, 2010 at 18:58

We have over 10 million imaged documents in our system. It is spread out so that in each sub-folder, there is no more than 60k images (or so.) We have close to a half a terabyte of images and have no issues. – Revamp 4/8, 2011 at 14:2

O

22

Something nobody has mentioned is that the DB guarantees atomic actions, transactional integrity and deals with concurrency. Even referentially integrity is out of the window with a filesystem - so how do you know your file names are really still correct?

If you have your images in a file-system and someone is reading the file as you're writing a new version or even deleting the file - what happens?

We use blobs because they're easier to manage (backup, replication, transfer) too. They work well for us.

Overcautious answered 6/8, 2008 at 17:38 Comment(2)

What is the likelihood of having two simultanious updates to a particular image? – Masaccio 9/4, 2009 at 1:43

you don't need simultaneous updates to have problems - it can be a read and a write. In our case this is almost guaranteed to happen. – Overcautious 12/4, 2009 at 22:12

M

20

The problem with storing only filepaths to images in a database is that the database's integrity can no longer be forced.

If the actual image pointed to by the filepath becomes unavailable, the database unwittingly has an integrity error.

Given that the images are the actual data being sought after, and that they can be managed easier (the images won't suddenly disappear) in one integrated database rather than having to interface with some kind of filesystem (if the filesystem is independently accessed, the images MIGHT suddenly "disappear"), I'd go for storing them directly as a BLOB or such.

Micrometry answered 6/8, 2008 at 17:38 Comment(0)

G

17

At a company where I used to work we stored 155 million images in an Oracle 8i (then 9i) database. 7.5TB worth.

Greaseball answered 6/8, 2008 at 17:38 Comment(3)

Absolutely. Apparently the database is a lot bigger now. Having the data in a database means that replicating the database at different sites is a lot easier too. – Greaseball 12/3, 2009 at 12:6

I saw a demonstration of Oracle where the could actually mount a file system to the database, or something like that. Do you know if this is what you did? (Sorry, I am clueless with Oracle so maybe I am talking garbage.) – Hsinking 28/7, 2009 at 8:33

I don't think so - it was storing images in the database as a database. The database was aggressively tuned - I remember multiple discussions regarding the size of the images changing as fields were added and removed. Everything was boundary aligned. – Greaseball 28/7, 2009 at 9:52

A

14

Normally, I'm storngly against taking the most expensive and hardest to scale part of your infrastructure (the database) and putting all load into it. On the other hand: It greatly simplifies backup strategy, especially when you have multiple web servers and need to somehow keep the data synchronized.

Like most other things, It depends on the expected size and Budget.

Argyle answered 6/8, 2008 at 17:38 Comment(0)

G

13

We have implemented a document imaging system that stores all it's images in SQL2005 blob fields. There are several hundred GB at the moment and we are seeing excellent response times and little or no performance degradation. In addition, fr regulatory compliance, we have a middleware layer that archives newly posted documents to an optical jukebox system which exposes them as a standard NTFS file system.

We've been very pleased with the results, particularly with respect to:

Ease of Replication and Backup
Ability to easily implement a document versioning system

Gondar answered 6/8, 2008 at 17:38 Comment(0)

A

11

Assumption: Application is web enabled/web based

I'm surprised no one has really mentioned this ... delegate it out to others who are specialists -> use a 3rd party image/file hosting provider.

Store your files on a paid online service like

Another StackOverflow threads talking about this here.

This thread explains why you should use a 3rd party hosting provider.

It's so worth it. They store it efficiently. No bandwith getting uploaded from your servers to client requests, etc.

Amusement answered 6/8, 2008 at 17:38 Comment(0)

L

11

If this is web-based application then there could be advantages to storing the images on a third-party storage delivery network, such as Amazon's S3 or the Nirvanix platform.

Liebfraumilch answered 6/8, 2008 at 17:38 Comment(0)

W

10

If you're not on SQL Server 2008 and you have some solid reasons for putting specific image files in the database, then you could take the "both" approach and use the file system as a temporary cache and use the database as the master repository.

For example, your business logic can check if an image file exists on disc before serving it up, retrieving from the database when necessary. This buys you the capability of multiple web servers and fewer sync issues.

Wellestablished answered 6/8, 2008 at 17:38 Comment(1)

+1 This also allows you to store the original image, delivering the cached/optimised version while allowing the size/compression to be altered later – Lidda 29/9, 2011 at 13:51

G

7

I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).

Pros:

Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files

Cons:

mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
Overall not very memory/cpu efficient when compared to filesystem

I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.

Gumbo answered 6/8, 2008 at 17:38 Comment(0)

R

7

It depends on the number of images you are going to store and also their sizes. I have used databases to store images in the past and my experience has been fairly good.

IMO, Pros of using database to store images are,

A. You don't need FS structure to hold your images
B. Database indexes perform better than FS trees when more number of items are to be stored
C. Smartly tuned database perform good job at caching the query results
D. Backups are simple. It also works well if you have replication set up and content is delivered from a server near to user. In such cases, explicit synchronization is not required.

If your images are going to be small (say < 64k) and the storage engine of your db supports inline (in record) BLOBs, it improves performance further as no indirection is required (Locality of reference is achieved).

Storing images may be a bad idea when you are dealing with small number of huge sized images. Another problem with storing images in db is that, metadata like creation, modification dates must handled by your application.

Rider answered 6/8, 2008 at 17:38 Comment(0)

R

7

SQL Server 2008 offers a solution that has the best of both worlds : The filestream data type.

Manage it like a regular table and have the performance of the file system.

Restless answered 6/8, 2008 at 17:38 Comment(0)

S

7

I'm not sure how much of a "real world" example this is, but I currently have an application out there that stores details for a trading card game, including the images for the cards. Granted the record count for the database is only 2851 records to date, but given the fact that certain cards have are released multiple times and have alternate artwork, it was actually more efficient sizewise to scan the "primary square" of the artwork and then dynamically generate the border and miscellaneous effects for the card when requested.

The original creator of this image library created a data access class that renders the image based on the request, and it does it quite fast for viewing and individual card.

This also eases deployment/updates when new cards are released, instead of zipping up an entire folder of images and sending those down the pipe and ensuring the proper folder structure is created, I simply update the database and have the user download it again. This currently sizes up to 56MB, which isn't great, but I'm working on an incremental update feature for future releases. In addition, there is a "no images" version of the application that allows those over dial-up to get the application without the download delay.

This solution has worked great to date since the application itself is targeted as a single instance on the desktop. There is a web site where all of this data is archived for online access, but I would in no way use the same solution for this. I agree the file access would be preferable because it would scale better to the frequency and volume of requests being made for the images.

Hopefully this isn't too much babble, but I saw the topic and wanted to provide some my insights from a relatively successful small/medium scale application.

Supersensible answered 6/8, 2008 at 17:38 Comment(1)

When dealing with replication, storing the images in the database is far superior IMO. – Entrepreneur 3/5, 2009 at 20:26

T

6

Im my experience I had to manage both situations: images stored in database and images on the file system with path stored in db.

The first solution, images in database, is somewhat "cleaner" as your data access layer will have to deal only with database objects; but this is good only when you have to deal with low numbers.

Obviously database access performance when you deal with binary large objects is degrading, and the database dimensions will grow a lot, causing again performance loss... and normally database space is much more expensive than file system space.

On the other hand having large binary objects stored in file system will cause you to have backup plans that have to consider both database and file system, and this can be an issue for some systems.

Another reason to go for file system is when you have to share your images data (or sounds, video, whatever) with third party access: in this days I'm developing a web app that uses images that have to be accessed from "outside" my web farm in such a way that a database access to retrieve binary data is simply impossible. So sometimes there are also design considerations that will drive you to a choice.

Consider also, when making this choice, if you have to deal with permission and authentication when accessing binary objects: these requisites normally can be solved in an easier way when data are stored in db.

Tice answered 6/8, 2008 at 17:38 Comment(0)

C

4

In a previous project i stored images on the filesystem, and that caused a lot of headaches with backups, replication, and the filesystem getting out of sync with the database.

In my latest project i'm storing images in the database, and caching them on the filesystem, and it works really well. I've had no problems so far.

Confiscable answered 6/8, 2008 at 17:38 Comment(0)

S

4

I once worked on an image processing application. We stored the uploaded images in a directory that was something like /images/[today's date]/[id number]. But we also extracted the metadata (exif data) from the images and stored that in the database, along with a timestamp and such.

Sheepdip answered 6/8, 2008 at 17:38 Comment(0)

P

3

Storing an image in the database still means that the image data ends up somewhere in the file system but obscured so that you cannot access it directly.

+ves:

database integrity
its easy to manage since you don't have to worry about keeping the filesystem in sync when an image is added or deleted

-ves:

performance penalty -- a database lookup is usually slower that a filesystem lookup
you cannot edit the image directly (crop, resize)

Both methods are common and practiced. Have a look at the advantages and disadvantages. Either way, you'll have to think about how to overcome the disadvantages. Storing in database usually means tweaking database parameters and implement some kind of caching. Using filesystem requires you to find some way of keeping filesystem+database in sync.

Preterit answered 6/8, 2008 at 17:38 Comment(0)

A

3

The word on the street is that unless you are a database vendor trying to prove that your database can do it (like, let's say Microsoft boasting about Terraserver storing a bajillion images in SQL Server) it's not a very good idea. When the alternative - storing images on file servers and paths in the database is so much easier, why bother? Blob fields are kind of like the off-road capabilities of SUVs - most people don't use them, those who do usually get in trouble, and then there are those who do, but only for the fun of it.

Amabelle answered 6/8, 2008 at 17:38 Comment(0)

A

3

Second the recommendation on file paths. I've worked on a couple of projects that needed to manage large-ish asset collections, and any attempts to store things directly in the DB resulted in pain and frustration long-term.

The only real "pro" I can think of regarding storing them in the DB is the potential for easy of individual image assets. If there are no file paths to use, and all images are streamed straight out of the DB, there's no danger of a user finding files they shouldn't have access to.

That seems like it would be better solved with an intermediary script pulling data from a web-inaccessible file store, though. So the DB storage isn't REALLY necessary.

Altar answered 6/8, 2008 at 17:38 Comment(0)

W

2

I'm the lead developer on an enterprise document management system in which some customers store hundreds of gigabytes of documents. Terabytes in the not too distant future. We use the file system approach for many of the reasons mentioned on this page plus another: archiving.

Many of our customers must conform to industry specific archival rules, such as storage to optical disk or storage in a non-proprietary format. Plus, you have the flexibility of simply adding more disks to a NAS device. If you have your files stored in your database, even with SQL Server 2008's file stream data type, your archival options just became a whole lot narrower.

Whop answered 6/8, 2008 at 17:38 Comment(0)

B

1

As someone mentioned already, "it depends". If storage in a database is supposed to be a 1-to-1 fancy replacement for filesystem, it may not be quite a best option.

However, if a database backend will provide additional values, not only a serialization and storage of a blob, then it may make a real sense.

You may take a look at WKT Raster which is a project aiming at developing raster support in PostGIS which in turn serves as a geospatial extension for PostgreSQL database system. Idea behind the WKT Raster is not only to define a format for raster serialization and storage (using PostgreSQL system), but, what's much more important than storage, is to specify database-side efficient image processing accessible from SQL. Long story short, the idea is to move the operational weight from client to database backend, so it take places as close to storage itself as possible. The WKT Raster, as PostGIS, is dedicate to applications of specific domain, GIS.

For more complete overview, check the website and presentation (PDF) of the system.

Beamon answered 6/8, 2008 at 17:38 Comment(0)

A

1

If you need to store lots of images on the file system a couple of things to think about include:

Backup and restore. How do you keep the images in sync.
Filesystem performance. Depends on what you are doing and the filesystem, but you may want to implement a hashing mechanism so that you don't have a single directory with billions of files.
Replication. Do you need to keep the files in sync between multiple servers?

Agglutinate answered 6/8, 2008 at 17:38 Comment(0)

V

1

The only reason we store images in our tables is because each table (or set of tables per range of work) is temporary and dropped at the end of the workflow. If there was any sort of long term storage we'd definitely opt for storing file paths.

It should also be noted that we work with a client/server application internally so there's no web interface to worry about.

Vigilante answered 6/8, 2008 at 17:38 Comment(0)

J

1

I would personally store the large data outside of the database.

Pros: Stores everything in one please, easy access to data files, easy baskup Cons: Decreases database performance, many page splits, possible database coruption

Juncaceous answered 6/8, 2008 at 17:38 Comment(1)

do you mean inside the database? – Blair 28/11, 2008 at 5:46

G

1

Your web-server (I'm assuming you are using one) is designed to handle images while a database is not. Thus I would vote heavily on the nay side.

Store just the path (and maybe file info too) in the database.

Giacobo answered 6/8, 2008 at 17:38 Comment(0)

J

0

I will go for both solution, I mean...I will develop a litle component (EJB) that store the images in a DB plus the path of this image into the server. This DB only will be updated if we have a new image or the original image it's updated. Then I will also store the path in the business DB.

From an application point of view, I will always user the file system (retrieving the path from th business DB) and by this way we will fix the backup issue, and also avoid possible performance issues.

The only weakness is that we will store the same image 2 times...the good point is that the memory is cheap, come on!.

Jacobah answered 6/8, 2008 at 17:38 Comment(0)

T

0

If you are on Teradata, then Teradata Developer Exchange has a detailed article on loading and retrieving lobs and blobs..

http://developer.teradata.com/applications/articles/large-objects-part-1-loading

Theo answered 6/8, 2008 at 17:38 Comment(0)

Y

0

For a large number of small images, the database might be better.

I had an application with many small thumbnails (2Kb each). When I put them on the filesystem, they each consumed 8kb, due to the filesystem's blocksize. A 400% increase in space!

See this post for more information on block size: What is the block size of the iphone filesystem?

Yeseniayeshiva answered 6/8, 2008 at 17:38 Comment(0)

V

0

I have worked with many digital storage systems and they all store digital objects on the file system. They tend to use a branch approach, so there will be an archive tree on the file system, often starting with year of entry e.g. 2009, subdirectory will be month e.g. 8 for August, next directory will be day e.g. 11 and sometimes they will use hour as well, the file will then be named with the records persistent ID. Using BLOBS has its advantages and I have heard of it being used often in the IT parts of the chemical industry for storing thousands or millions of photographs and diagrams. It can provide more granular security, a single method of backup, potentially better data integrity and improved inter media searching, Oracle has many features for this within the package they used to call Intermedia (I think it is called something else now). The file system can also have granular security provided through a system such as XACML or another XML type security object. See D Space of Fedora Object Store for examples.

Vitovitoria answered 6/8, 2008 at 17:38 Comment(0)

K

0

I'd almost never store them in the DB. The best approach is usually to store your images in a path controlled by a central configuration variable and name the images according to the DB table and primary key (if possible). This gives you the following advantages:

Move your images to another partition or server just by updating the global config.
Find the record matching the image by searching on its primary key.
Your images are accessable to processing tools like imagemagick.
In web-apps your images can be handled by your webserver directly (saving processing).
CMS tools and web languages like Coldfusion can handle uploading natively.

Kozlowski answered 6/8, 2008 at 17:38 Comment(0)

S

0

Database for data

Filesystem for files

Selfstyled answered 6/8, 2008 at 17:38 Comment(1)

You can put that like this: don't put the data in a database column if you cannot use it for a where condition or a join. That is unlikely for binary data. – Acoustic 16/12, 2009 at 14:38

C

0

I would go with the file system approach, primarily due to its better flexibility. Consider that if the number of images gets huge, one database may not be able to handle it. With file system, you can simple add more file servers, assuming that you're using NFS or kind.

Another advantage the file system approach has is to be able to do some fancy stuffs, such as you can use Amazon S3 as the primary storage (save the url in the database instead of file path). In case of outages happen to S3, you fall back to your file server (may be another database entry containing the file path). Some voodoo to apply to Apache or whatever web server you're using.

Cultch answered 6/8, 2008 at 17:38 Comment(0)

A

0

I would go with the file system approach. No need to create or maintain a DB with images, it will save you some major headaches in the long run.

Allegra answered 6/8, 2008 at 17:38 Comment(0)

D

0

I prefer to store image paths in the DB and images on the filesystem (with rsync between servers to keep everything reasonably current).

However, some of the content-management-system stuff I do needs the images in the CMS for several reasons- visibility control (so the asset is held back until the press release goes out), versioning, reformatting (some CMS's will dynamically resize for thumbnails )and ease of use for linking the images into the WYSIWYG pages.

So the rule of thumb for me is to always stash application stuff on the filesystem, unless it's CMS driven.

Dewain answered 6/8, 2008 at 17:38 Comment(0)

W

0

Another benefit of storing the images in the file system is that you don't have to do anything special to have the client cache them...

...unless of course the image isn't accessible via the document root (e.g. authentication barrier), in which case you'll need to check the cache-control headers your code is sending.

Whitsunday answered 6/8, 2008 at 17:38 Comment(0)

L

0

One thing you need to keep in mind is the size of your data set. I believe that Dillie-O was the only one who even remotely hit the point.

If you have a small, single user, consumer app then I would say DB. I have a DVD management app that uses the file system (in Program Files at that) and it is a PIA to backup. I wish EVERY time that they would store them in a db, and let me choose where to save that file.

For a larger commercial application then I would start to change my thinking. I used to work for a company that developed the county clerks information management application. We would store the images on disk, in an encoded format [to deal with FS problems with large numbers of files] based on the county assigned instrument number. This was useful on another front as the image could exist before the DB record (due to their workflow).

As with most things: 'It depends on what you are doing'

Lipophilic answered 6/8, 2008 at 17:38 Comment(0)

M

0

File system, for sure. Then you get to use all of the OS functionality to deal with these images - back ups, webserver, even just scripting batch changes using tools like imagemagic. If you store them in the DB then you'll need to write your own code to solve these problems.

Matrilineage answered 6/8, 2008 at 17:38 Comment(0)

P

0

Pulling loads of binary data out of your DB over the wire is going to cause huge latency issues and won't scale well.

Store paths in the DB and let your webserver take the load - it's what it was designed for!

Personalty answered 6/8, 2008 at 17:38 Comment(0)

A

0

Attempting to mimic a file system using SQL is generally a bad plan. You ultimately write less code with equal or better results if you stick with the file system for external storage.

Anastigmatic answered 6/8, 2008 at 17:38 Comment(0)

F

-1

If you are planning a public facing web site then you should not go with either option. Your should use a Content Delivery Network (CDN). There are price, scalability and speed advantages to a CDN when delivering a large amount of static content over the internet.

Flushing answered 6/8, 2008 at 17:38 Comment(0)

C

-1

Images on a file store are the best bet, and supplement this with storing the meta data in a database. From a web server perspective, the fast way to serve stuff up is to point to it directly. If it's in the database - ala Sharepoint - you have the overhead of ADO.Net to pull it out, stream it, etc.

Documentum - while bloated and complicated - has it right in that the files are out on the share and available for you to determine how to store them - disk on the server, SAN, NAS, whatever. The Documentum strategy is to store the files a tree structure by encoding the folders and file names according to their primary key in the DB. The DB becomes the resource for knowing what files are what and for enforcing security. For high volume systems this type of approach is a good way to go.

Also consider this when dealing with metadata: should you ever need to update the attributes of your meta data corpus, the DB is your friend as you can quickly perform the updates with SQL. With other tagging systems you do not have the easy data manipulation tools at hand

Cioban answered 6/8, 2008 at 17:38 Comment(0)

R

-1

In my little application I have at least a million files weighing in at about 200GB at last count. All the files are sitting in an XFS file system mounted on a linux server over iscsi. The paths are stored in the database. use some kind of intelligent naming convention for your file paths and file names.

IMHO, use the file system for what it was meant to do - store files. Databases generally do not offer you any advantage over a standard file system in storing binary data.

Rowe answered 6/8, 2008 at 17:38 Comment(0)

P

-1

In my current application, I'm doing both. When the user identifies an image to attach to a record, I use ImageMagick to resize it to an appropriate size for display on screen (about 300x300 for my application) and store that in the database for ease of access, but then also copy the user's original file to a network share so that it's available for applications that require higher resolution (like printing).

(There are a couple other factors involved as well: Navision will only display BMPs, so when I resize it I also convert to BMP for storage, and the database is replicated to remote sites where it's useful to be able to display the image. Printing is only done at the head office, so I don't need to replicate the original file.)

Phyllis answered 6/8, 2008 at 17:38 Comment(0)

D

-1

No, due to page splits. You're essentially defining rows that can be 1KB - n MB so your database will have a lot of empty spaces in its pages which is bad for performance.

Drees answered 6/8, 2008 at 17:38 Comment(0)

D

-1

I would go with the file system approach. As noted by a few others, most web servers are built to send images from a file path. You'll have much higher performance if you don't have to write or stream out BLOB fields from the database. Having filesystem storage for the images makes it easier to setup static pages when the content isn't changing or you want limit the load on the database.

Drily answered 6/8, 2008 at 17:38 Comment(0)

Recommended topics

Hot tags