Storing PDF files as binary objects in SQL Server, yes or no?
Asked Answered
S

6

25

I have to find a design decision for the following task:

I have a SQL Server database and it contains a table of orders. PDF documents will be uploaded by users through a simple file upload from a web page and assigned to an order. There is not more than one document per order (perhaps no document, never more than one). For this purpose a user opens a web page, enters an order number, gets the order displayed and clicks on an upload button. So I know to which order the uploaded document belongs to.

Now I am considering two options to store the documents on the web server:

1) Extend my table of orders by a varbinary(MAX) column and store the PDF document directly into that binary field.

2) Save the PDF file in a specific folder on disk and give it a unique name related to the order (for instance my order number which is a primary key in the database, or a GUID which I could store in an additional column of the order table). Perhaps I have to store the files in subfolders, one per month, and store the subfolder name into the order row in the database, to avoid getting too many thousand files in one folder.

After the PDF files are stored they can be downloaded and viewed via browser after entering the related order number.

I'm tending towards option (1) because the data management seems easier to me having all relevant data in one database. But I am a bit afraid that I could encounter performance issues over time since my database size will grow much faster than with solution (2). Around 90% or even 95% of the total database size would be made up only by those stored PDF files.

Here is some additional information:

  • The PDF files will have a size of around 100 Kilobyte each
  • Around 1500 orders/PDF files per month
  • Windows Server 2008 R2 / IIS 7.5
  • SQL Server 2008 SP1 Express
  • Not quite sure about the hardware, I believe one QuadCore Proc. and 4 GB RAM
  • Application is written in ASP.NET Webforms 3.5 SP1

(I am aware that I will reach the 4GB-limit of the SQL Server Express edition after around 2 years with the numbers above. But we can disregard this here, either removing old data from the database or upgrading to a full license will be a possible option.)

My question is: What are the Pro and Contras of the options and what would you recommend? Perhaps someone had a similar task and can report about his experience.

Thank you in advance for reply!

Related:

Storing Images in DB - Yea or Nay?

Seabolt answered 27/2, 2010 at 15:29 Comment(1)
Great question. The entire case scenario was depicted. Of great use for new comers to DB design and implementation.Ava
A
29

With SQL Server 2008, when you have documents that are mostly 1 MB or more in size, the FILESTREAM feature would be recommended. This is based on a paper published by Microsoft Research called To BLOB or not to BLOB which analyzed the pros and cons of storing blobs in a database in great length - great read!

For documents of less than 256K on average, storing them in a VARBINARY(MAX) column seems to be the best fit.

Anything in between is a bit of a toss-up, really.

You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem. One thing you might want to consider is having a separate table for the documents that is linked to the main facts table. That way, the facts table will be faster in usage, and the documents don't get in the way of your other data.

Averroism answered 27/2, 2010 at 15:33 Comment(7)
I would store all files on disk, it saves building 2 mechanisms for storing files.Vandusen
@Mark: but in that case by all means use the FILESTREAM feature, since only that will guarantee that your table with the file names and the files on disk are in sync. Otherwise you'll always end up having rogue entries and zombie files on disk (not being referenced anymore) - quite a messAverroism
@Averroism - I'm reading the MS Research paper now, and it raises some good points. It still goes against my instinct though that if you have just a 100 MB link between your DB Server and Web Server that you will have a bottle neck introduced due to communication traffic. Good link - it's illuminating.Flexed
@marc_s: That's an important point. In other words: Does the FILESTREAM feature guarantee that the file on disk is deleted automatically when I delete the table row? And does the relation stay consistent even when I rename the file?Seabolt
Hm, the linked paper is quite interesting because it says that storing and retrieving files with size under 256k as blobs in SQL Server is FASTER (!) than NTFS file system. Something I didn't expect. I was only considering SQL as data storage because it is more comfortable (and then hoping the performance drawback compared to file system storage is not too big). But if SQL is not only more comfortable but also FASTER for not too large files what is then a valid argument for file system storage? Or is that paper perhaps outdated (the test were made with SQL 2005 Beta)?Seabolt
@slauma: yes, that was quite interesting and astonishing to me, too! It just shows that sometime, your gut feeling might be off and it's worth while actually checking things out scientifically :-)Averroism
Yes, and in the end I am convinced by those scientific arguments. I also found other places in MSDN which confirm that storing not too large files in SQL blobs have better performance than the file system. I will start to implement it this way, not at least because it's easier managed too.Seabolt
E
3

This was asked many times about storing images, but the discussion to those still applies:

Environment answered 27/2, 2010 at 15:35 Comment(1)
Thanks for the links. I hadn't the idea to search for the "image" keyword, so I didn't find anything appropriate to answer my question.Seabolt
C
1

I would also create a separate table for the documents, that way the search data/key fields for document retrieval will be more cache'able. The only time your database will need to touch the document table is during an insert or download.

Coneflower answered 27/2, 2010 at 15:40 Comment(0)
F
1

I would recommend AGAINST storing the files in SQL. You are adding extra overhead when retrieving the files. IIS is really efficient at serving up files, but with SQL are the storage facility you now have introduced a bottle neck, as you now have to hop from your web server to your SQL Server and back to get the file.

When you store your files on the webserver, your process can determine the appropriate file based on the criteria you've listed, point to it and serve it. Document management systems such as Documentum and Alfresco store the files on a share, and this allows you great flexibility with respects to back up and and redundant storage.

Flexed answered 27/2, 2010 at 15:46 Comment(0)
A
0

I am sceptical storing large blobs in SQL, assuming that sql page size is 4k (off the nut).. it has to assemble fragment of the entire file in nK blocks when serving the file back to user .. I am not sure whether this is the case or not.

Arginine answered 27/2, 2010 at 16:2 Comment(0)
B
0

We ran in to a similar situation albeit in principle only. We needed a way by which documents stored to SharePoint could be accessed via a link on a web page. Since everything is project based with a unique project number the solution was to implement a common naming convention to the documents. s the web page is created server-side, the links are dynamically created. The code takes the base path to the SharePoint server and then adds the project number and specifics for the document.

Example:

[SharePoint Base Path][Project Numbe][Project Document Name]
[http://mysharepoint.mycompany.com/213990/213990_PC.pdf]
Beginner answered 12/2, 2013 at 0:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.