Store user profile pictures on disk or in the database?
Asked Answered
R

2

25

I'm building an asp.net MVC application where users can attach a picture to their profile, but also in other areas of the system like a messaging gadget on the dashboard that displays recent messages etc.

When the user uploads these I am wondering whether it would be better to store them in the database or on disk.

Database advantages

  • Easy to backup the entire database and keep profile content/images with associated profile/user tables

  • when I build web services later down the track, they can just pull all the profile related data from one spot(the database)

Filesystem advantages

  • loading files from disk is probably faster

  • any other advantages?

Where do other sites store this sort of information? Am I right to be a little concerned about database performance for something like this?

Maybe there would be a way to cache images pulled out from the database for a period of time?

Alternatively, what about the idea of storing these images in the database, but shadow copying them to disk so the web server can load them from there? This would seem to give both the backup and convenience of a Db, whilst giving the speed advantages of files on disk.

Infrastructure in question

  • The website will be deployed to IIS on windows server 2003 running NTFS file system.
  • The database will be SQL Server 2008

Summary

Reading around on a lot of related threads here on SO, many people are now trending towards the SQL Server Filestream type. From what I could gather however (I may be wrong), there isn't much benefit when the files are quite small. Filestreaming however looks to greatly improve performance when files are multiple MB's or larger.

As my profile pictures tend to sit around ~5kb I decided to just leave them stored in a filestore in the database as varbinary(max).

In ASP.NET MVC I did see a bit of a performance issue returning FileContentResults for images pulled out of the database like this. So I ended up caching the file on disk when it is read if the location to this file is not found in my application cache.

So I guess I went for a hybrid;

  • Database storage to make baking up of data easier and files are linked directly to profiles
  • Shadow copying to disk to allow better caching

At any point I can delete the cache folder on disk, and as the images are re-requested they will be re-copied on first hit and served from the cache there after.

Rader answered 21/7, 2010 at 23:7 Comment(3)
You should read this: #4248Devour
Sorry there is a few typos with my question. Typing this from my iPhone. Will edit when I get to a pc.Rader
Thanks anders, that was exactly what I was looking for. Lots of views and opinions in that thread you linked.Rader
S
6

Actually, your data store lookup with the database may actually be faster, depending on the number of images you have, unless you are using a highly optimized filesystem engine. Databases are designed for fast lookups and use a lot more interesting techniques than a file system does.

ReiserFS (obsolete) is really awesome for lookups. ZFS, XFS, and NTFS all have fantastic hashing algorithms. Linux ext4 looks promising too.

The hit on the system is not going to be any different in terms of block reads. The question is: what is faster, a query lookup that returns the filename (maybe a hash?), which in turn is accessed using a separate open, file send, close? Or just dumping the blob out?

There are several things to consider, including network hit, processing hit, distributability, etc. If you store stuff in the database, then you can move it. Then again, if you store images on a content delivery service, that may be way faster, since you are not doing any network hits on yourself.

Think about it, and remember a bit of benchmarking never hurt anybody :-) So, test it out with your typical dataset size and take into account things like simultaneous queries, etc.

Senseless answered 21/7, 2010 at 23:18 Comment(2)
Hi Elf King. You make a good point about infrastructure, I have updated my question. Whilst I realise there are many factors to consider (as you have pointed out), I'm looking for some informed opinions / possibly some experience on people who have tried one way or the other and how it worked out.Rader
Well I've successfully done both in production environments for large projects. In one project the system produced about 10 x 10MB images every second of every day, and all had to be sorted and associated to multiple users. (Used a combination of FS and DB there). Here due to high rate of production had to use distributed FS. In other the images were large but static, used DB blobs there. I think for your application, you may be better off with just DB. You may have to consider legalities of who owns the digital image as well.Senseless
S
14

You should store a reference to the files on a database and store the actual files on disk.

This approach is more flexible and easier to scale.

You can have a single database and several servers serving static content. It will be much trickier to have several databases doing that work.

Flickr works this way.

I gave a more detailed answer here, you may find it useful.

Sanburn answered 10/8, 2010 at 21:0 Comment(3)
Awesome. I was trying to find a good way to store profile picture in my website, I'm going to go with this solution tooMita
@JonKoivula I have another answer that dwells into further detail. It may be helpful. #8922556Sanburn
Wow thanks a lot Frankie, you really gave me a helpful answer. I appreciate it a lot.Mita
S
6

Actually, your data store lookup with the database may actually be faster, depending on the number of images you have, unless you are using a highly optimized filesystem engine. Databases are designed for fast lookups and use a lot more interesting techniques than a file system does.

ReiserFS (obsolete) is really awesome for lookups. ZFS, XFS, and NTFS all have fantastic hashing algorithms. Linux ext4 looks promising too.

The hit on the system is not going to be any different in terms of block reads. The question is: what is faster, a query lookup that returns the filename (maybe a hash?), which in turn is accessed using a separate open, file send, close? Or just dumping the blob out?

There are several things to consider, including network hit, processing hit, distributability, etc. If you store stuff in the database, then you can move it. Then again, if you store images on a content delivery service, that may be way faster, since you are not doing any network hits on yourself.

Think about it, and remember a bit of benchmarking never hurt anybody :-) So, test it out with your typical dataset size and take into account things like simultaneous queries, etc.

Senseless answered 21/7, 2010 at 23:18 Comment(2)
Hi Elf King. You make a good point about infrastructure, I have updated my question. Whilst I realise there are many factors to consider (as you have pointed out), I'm looking for some informed opinions / possibly some experience on people who have tried one way or the other and how it worked out.Rader
Well I've successfully done both in production environments for large projects. In one project the system produced about 10 x 10MB images every second of every day, and all had to be sorted and associated to multiple users. (Used a combination of FS and DB there). Here due to high rate of production had to use distributed FS. In other the images were large but static, used DB blobs there. I think for your application, you may be better off with just DB. You may have to consider legalities of who owns the digital image as well.Senseless

© 2022 - 2024 — McMap. All rights reserved.