What is the best place for storing uploaded images, SQL database or disk file system? [closed]
Asked Answered
D

18

173

I'm writing an application that allows users to upload images onto the server. I expect about 20 images per day all jpeg and probably not edited/resized. (This is another question, how to resize the images on the server side before storing. Maybe someone can please drop a .NET resource for that in the comment or so). I wonder now what the best place for storing uploaded images is.

  • Store the images as a file in the file system and create a record in a table with the exact path to that image.

  • Or, store the image itself in a table using an "image" or "binary data" data type of the database server.

I see advantages and disadvantages in both. I like a) because I can easily relocate the files and just have to change the table entry. On the other hand I don't like storing business data on the web server and I don't really want to connect the web server to any other datasource that holds business data (for security reasons) I like b) because all the information is in one place and easily accessible by a query. On the other hand the database will get very big very soon. Outsourcing that data could be more difficult.

Denson answered 7/12, 2008 at 23:55 Comment(3)
I didn't find it, where?Denson
Here #4248Arguseyed
Possible duplicate of Storing Images in DB - Yea or Nay?Philosopher
D
113

I generally store files on the file-system, since that's what its there for, though there are exceptions. For files, the file-system is the most flexible and performant solution (usually).

There are a few problems with storing files on a database - files are generally much larger than your average row - result-sets containing many large files will consume a lot of memory. Also, if you use a storage engine that employs table-locks for writes (ISAM for example), your files table might be locked often depending on the size / rate of files you are storing there.

Regarding security - I usually store the files in a directory that is outside of the document root (not accessible through an http request) and serve them through a script that checks for the proper authorization first.

Dillard answered 8/12, 2008 at 0:3 Comment(7)
Could you please explain me the last paragraph ( Regarding security ) in terms of the technical details or any pointers would be very helpful. Thank you.Ashlaring
(For all you googlers out there) If you have your site's root configured to a "public" folder (as in my_website/public/ instead of just my_website/), you can store the images in the my_website/my_images folder with the rest of your app. Then your img tags would reference "my_website/image.php?img_id=55" instead of "my_website/avatar.png", and your image.php script would, after verifying your credentials and parsing the id you hand it, return the actual image. That way, the image is only viewable by the proper logged in user.Raphael
hey captain you should turn that into an actual answer so you can get points $$$Renascence
please add a few more notes on security/preventing files from destroying your web siteRenascence
I prefer adding the files/images in a UNC path and store the complete url in the database.This makes it easy to query these binary data.All the servers the application is deployed to (In a webfarm scenario) should have access to this UNC pathSeise
That would not scale, there is a limit on number of files in the folder and if you plan to divide your files in multiple folders then it would add complexities of indexing the files(To identify where the file is actually store). Moreover, search will be very slow.Sonata
How can I make sure if i deleted the record row from the database the file linked with that row is also removed from the files system to miantian data integirty? Any automated ways. I am using SQLAlchemy.Vasta
M
56

The only benefit for the option B is having all the data in one system, yet it's a false benefit! You may argue that your code is also a form of data, and therefore also can be stored in database - how would you like it?

Unless you have some unique case:

  • Business logic belongs in code.
  • Structured data belongs in database (relational or non-relational).
  • Bulk data belongs in storage (filesystem or other).

Files, Code, Data

It is not necessary to use filesystem to keep files. Instead you may use cloud storage (such as Amazon S3) or Infrastructure-as-a-service on top of it (such as Uploadcare):

https://uploadcare.com/upload-api-cloud-storage-and-cdn/

But storing files in the database is a bad idea.

Marte answered 26/11, 2014 at 13:53 Comment(0)
T
34

I know this is an old post. But many visitors to this page are getting nothing related to the question. Especially for a newbie.

How to upload and store images or files in our website:

For a static website there maybe no problem since the file storage for some share hostings are still adequate. The problem comes from a dynamic website when it gets bigger. Bigger database can be handled, but bigger file quantities, such as images, becomes a problem. There are two types of images in a website:

  1. Images that come from the administrator in a dynamic blog. Usually, these images have been optimized before upload.

  2. Images from users. In case users are allowed to upload images such as avatars, or users can create blog content and input some images from text editor. In this kind of images it is difficult to predict the size. Users can upload big images just for small content resizing the view size but not the actual image size.

By ignoring item no. 1 above, quick solution for item no. 2 can be temporary solved by the following tips if we don't have image optimizer functionality in our website :

  1. Do not allow users to directly upload from text editors by redirecting them to an image gallery. On this page users must upload files in advance before they can embedded in the content. This method is called File Manager.

  2. Use a crop image function for users to upload images. This will limit the image size, even for users who upload very big files. The final image is the result of the cropped image. We can define the size in server side and accept only for example 500Kb or lower.

Now, that is only temporary. For a final solution, the question is repeated :

  • How to handle a big images storage?
  • Resize or change the extension.
  • How a big or medium website or e-commerce handle the file storage for their images?

What can we do then :

  1. Migrate from share hosting VPS. Not enough? Then upgrade to Dedicated one.

  2. Create your own server for file storage. Google how to do it. This is not as difficult as you think. Some people do it for their website.

  3. The easy way is use a CDN file storage service.

Okay, 1 and 2 is little bit expensive. But not 3. I think is the best solution.

Some CDN services allow you to store as many web files as you want.

Question, "how to upload files to CDN from our website?"

Don't worry, once you register, usually free, you will get guidance how to upload files and get their link from/to your website. You will get an API and more. It's easy.

Some providers give us a free service for 14 days with limited storage and bandwidth. But that will be okay for starting point. The only problem is because 'people never try'.

Hope it will help for newbie.

Turki answered 1/12, 2016 at 11:40 Comment(0)
B
15

We have had clients insist on option B (database storage) a few times on a few different backends, and we always ended up going back to option A (filesystem storage) eventually.

Large BLOBs like that just have not been handled well enough even by SQL Server 2005, which is the latest one we tried it on.

Specifically, we saw serious bloat and I think maybe locking problems.

One other note: if you are using NTFS based storage (windows server, etc) you might consider finding a way around putting thousands and thousands of files in one directory. I am not sure why, but sometimes the file system does not cope well with that situation. If anyone knows more about this I would love to hear it.

But I always try to use subdirectories to break things up a bit. Creation date often works well for this:

Images/2008/12/17/.jpg

...This provides a decent level of separation, and also helps a bit during debugging. Explorer and FTP clients alike can choke a bit when there are truly huge directories.

EDIT: Just a quick note for 2017, in more recent versions of SQL Server, there are new options for handling lots of BLOBs that are supposed to avoid the drawbacks I discussed.

EDIT: Quick note for 2020, Blob Storage in AWS/Azure/etc has also been an option for years now. This is a great fit for many web-based projects since it's cheap and it can often simplify certain issues around deployment, scaling to multiple servers, debugging other environments when necessary, etc.

Backspin answered 8/12, 2008 at 2:6 Comment(3)
Good warning about the number of files on the same directory. It can give errors too hard to find in a production environment.Gans
I had hit this problem before. NTFS behaved unpredictably with some 10,000 files in a folder.Transonic
Not just NTFS but also BTRFS, which also has a problem dealing with huge quantities of images in one folder. Namely if you tried to ls it would take forever (hangs). Or delete.Peterman
A
12

I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).

Pros:

  • Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
  • Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
  • I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files

Cons:

  • mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
  • Overall not very memory/cpu efficient when compared to filesystem

I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.

Anile answered 8/12, 2008 at 4:29 Comment(0)
C
10

Definitely resize the image, and check it's format if you can. There have been cases of malicious files being uploaded and served by unwitting hosts- for instance, the GIFAR vulnerability allowed you to hide a malicious java applet in a GIF file, which would then be able to read cookies in the current context and send them to another site for a cross-site scripting attack. Resizing the images usually prevents this, as it munges the embedded code. While this attack has been fixed by JVM patches, naively serving up binary files without scrubbing them opens you up to a whole range of vulnerabilities.

Remember, most virus scanners can only run against the filesystem- if you store your binaries in the DB, you won't be able to run a scanner against them very easily.

Chorizo answered 8/12, 2008 at 2:19 Comment(0)
A
9

This is basically I do.

  1. Store an uploaded image in temporary directory or memory.
  2. Process that image before permanently storing it. 2.1. Color corrections 2.2. Compress 2.3. Create several copies based on image dimensions 2.4. Rename with .xl, .lg, .md, .sm etc. suffixes
  3. Pack all processed image files (from a single file) inside a folder with folder name as id which will be stored in database for any row/document along with image file name (or may be random name as image name).
  4. Create yyyy/mm/d path folder if doesn't exist. For example 2016/08/21. Remember that path and store in database for same document and row.
  5. Move image id folder to path folder. (Path folder may be located in /var/web-content folder.)
  6. Flush memory buffer or delete temporary file.

When you need to access any image mentioned in a document, you have the path and id of the folder than contains images. For example /var/web-content/{{path}}/{{id}}/image-file-name.sm.jpg

This way if you have to delete all processed image files, just delete the folder and it's content recursively.

Aphorism answered 29/6, 2016 at 16:17 Comment(0)
H
7

I use uploaded images on my website and I would definitely say option a).

One other thing I'd highly recommend is immediately changing the file name from what the user has named the photo, to something more manageable. For example something with the date and time to uniquely identify each picture.

It also helps to strip the user's file name of any strange characters to avoid future complications.

Hedrick answered 8/12, 2008 at 2:12 Comment(0)
M
5

There's sort of a hybrid approach in SQL Server 2008 called the filestream datatype that was talked about on RunAs Radio #74, which is sort of like the best of both worlds. Most people don't have the 2008 otion, but if you do, this option looks pretty cool

Multiplicate answered 9/12, 2008 at 5:15 Comment(0)
R
3

Most implementations are option A.

With option B, you open a whole big can of whoop4ss when you marshall those bits from the database into something that can be displayed on a browser... Also, if the db is down, the images are not available.

I don't think that space is too much of an issue... Terabyte drives are a couple hundred bucks now.

We are implementing with option A because we don't have the time or resources to do option B.

Rejoinder answered 8/12, 2008 at 0:40 Comment(0)
I
3

For auto resizing, try imagemagick... it is used for many major open source content/photo management systems... and I believe that there are some .net extensions for it.

Isothermal answered 8/12, 2008 at 1:58 Comment(0)
P
2

We use A. I would put it on a shared drive (unless you don't plan on running more than one server).

If the time comes when this won't scale for you then you can investigate caching mechanisms.

Physiography answered 8/12, 2008 at 0:3 Comment(0)
C
2

Absolutely, positively option A. Others have mentioned that databases generally don't deal well with BLOBs, whether they're designed to do so or not. Filesystems, on the other hand, live for this stuff. You have the option of using RAID striping, spreading images across multiple drives, even spreading them across geographically disparate servers.

Another advantage is your database backups/replication would be monstrous.

Climax answered 8/12, 2008 at 1:46 Comment(0)
O
2

Option A.

Once the image is loaded you can verify the format and resize it before saving. There a number of .Net code samples to resize images on http://www.codeproject.com. For instance: http://www.codeproject.com/KB/cs/Photo_Resize.aspx

Oppugnant answered 9/12, 2008 at 4:40 Comment(0)
S
2

For security reasons, it is also best practise to avoid problems caused by IE's Content Sniffing which can allow attackers to upload JavaScript inside image files, which might get executed in the context of your site. So you might want to transform the images (crop/resize them) somehow before storing them to prevent this sort of attack. This answer has some other ideas.

Sightly answered 13/1, 2011 at 3:38 Comment(0)
H
2

Well, I have a similar project where users upload files onto the server. Under my point of view, option a) is the best solution due to it's more flexible. What you must do is storing images in a protected folder classified by subdirectories. The main directory must be set up by the administrator as the content must no run scripts (very important) and (read, write) protected for not be accesible in http request.

I hope this helps you.

Homologue answered 6/4, 2012 at 14:52 Comment(0)
M
1

If they are small files that will not need to be edited then option B is not a bad option. I prefer this to writing logic to store files and deal with crazy directory structure issues. Having a lot of files in one directory is bad. emkay?

If the files are large or require constant editing, especially from programs like office, then option A is your best bet.

For most cases, it's a matter of preference, but if you go option A, just make re the directories don't have too many files in them. If you choose option B, then make the table with the BLOBed data be in it's own database and/or file group. This will help with maintenance, especially backups/restores. Your regular data is probably fairly small, while your image data will be huge over time.

Multiplicate answered 9/12, 2008 at 5:11 Comment(0)
B
1

It depends on your requirements, specially volume, users and frequency of search. But, for small or medium office, the best option is to use an application like Apple Photos or Adobe Lighroom. They are specialized to store, catalog, index, and organize this kind of resource. But, for large organizations, with strong requirements of storage and high number of users, it is recommend instantiate an Content Management plataform with a Digital Asset Management, like Nuxeo or Alfresco; both offers very good resources do manage very large volumes of data with simplified methods to retrive them. And, very important: there is an free (open source) option for both platforms.

Baronet answered 18/11, 2016 at 15:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.