Storing files in database Vs file system
Asked Answered
J

3

21

one of my customer ask for a Document Management System for some thousands of document in different format i.e. pdf, doc, docx etc. My question is what is the best way to store this file in database or in file system? How easy to secure a document between the two approach?.

Fast retrieval of the files is the key requirement..

am using mysql if that helps

Regards.

Jaela answered 4/8, 2011 at 9:30 Comment(1)
possible duplicate of Storing Documents as Blobs in a Database - Any disadvantages?Duty
F
21

You might want to store it directly into filesystem.

When using filesystem careful with :

  • Confidentiality : Put documents outside of your Apache Document Root. Then a PHP Controller of yours will output documents.
  • Sharded path : do not store thousands of documents in the same directory, make differents directories. You can shard with a Hash on the Filename for example. Such as /documents/A/F/B/AFB43677267ABCEF5786692/myfile.pdf.
  • Inode number : You can run out of inodes if you store a lot of small files (might not be your case if storing mostly PDF and office documents).

If you need to search for these documents (date/title/etc...) you may want to store metadata into a database for better performances.

FYI, in this question MS SQL Server has FILESYSTEM column type (like an hybrid), but at the moment MySQL doesn't have an alternative.

Freezer answered 4/8, 2011 at 10:27 Comment(4)
thanx. am not familiar with sharded path. Can u provide me with any online resource e.g tutorials etcJaela
For example you do not want to store everything on app/data/. Make a HASH of your document <?php $hash = hash('sha256', $filename); ?>. If the result of the hash is 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e730 you can store your file into app/data/2/c/f/2/2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e730 /$filename (2/c/f/2 are the 4 firsts characters). It "distributes" your documents into severals smallers directories instead of a single very big one. The example use 4 characters but you can use many more if needed (it depends on how much documents you have).Leclaire
what is the advantage of this? am expecting thousands of document in the system, is there any rule used in choosing number of characters depending on the number of files?Jaela
You'll have better I/O if you avoid big directories. You I/O will degrade if you use too much sub directories. Then it's up to you to balance correctly. I don't put more than 1000 files into a single directory. So with, saying 10 000 files, sharding on first character is ok. Sharding on 4 characters will shards files into 65 000 differents directories (too much for you). 16^4 and so on.Leclaire
D
6

Using filesystem access for big datablobs means in general faster access and less overhead than storing them in a mysql database.

One interesting and possibly related post: Storing Images in DB - Yea or Nay?

Duty answered 4/8, 2011 at 9:32 Comment(0)
V
1

for high performance you should use File system, using php glob function and JS interface. I finished project like this in march.

Vitiligo answered 4/8, 2011 at 9:34 Comment(1)
thanx.. but my biggest concern is the security of the files as the system has to store very confidential files and reports. Does the file system provide high security to the files?Jaela

© 2022 - 2024 — McMap. All rights reserved.