How to bring coordination between file system and database?
Asked Answered
H

4

10

I am working on a online file management project. We are storing references on the database (sql server) and files data on the on file system.

We are facing a problem of coordination between file system and database while we are uploading a file and also in case of deleting a file. First we create a reference in the database or store files on file system.

The problem is that if I create a reference in the database first and then store a file on file system, but while storing files on the file system any type of error occur, then the reference for that file is created in the database but no file data exist on the file system.

Please give me some solution how to deal with such situation. I am badly in need of it.

This case happens also while we deleting a file?

Hasin answered 26/3, 2010 at 16:28 Comment(3)
A file system can be thought of as a particular kind of database, albeit one that does not use SQL, and is particularly tuned for storing blobs. So in a sense, you're asking how to guarantee atomicity across two databases (the filesystem and your RDBMS). Is it possible to architect your system so that it uses only one or the other?Cheyennecheyne
no,we are using RDBMS to store references and file system to store files data/files body?Hasin
The obvious answer is to do things the other way round - databases tend to be much less likely to fail an operation than file systems.Nemesis
C
7

Access to the file system is indeed not transactional. You will need to simulate an all-or-nothing distributed transaction yourself: if the commit in database fails, delete the file on file-system. Inversely, if writing file fails, rollback database transaction (That will be a bit more complicated, but that's a rough sketch).

Note that it can get pretty complicated when a file is updated. You need first to copy it, so that if the database transaction fails after you've overwritten the file you can still restore the old version of the file. Whether you want to do this depends on the level of robustness that is desired.

Try to enforce that all manipulations go through your application (create, write, delete of files). If you can not do this and you can not prevent a file from being accessed directly on the file system (and maybe deleted), I see no other way than to periodically synchronize the database with the file system: check which file was removed and delete the entry in database. You could create a job that runs each X minute for that.

I would also suggest storing a hash (e.g. MD5) of the file in database. Take a bit of time to compute it, but that has been immensely useful for me to detect problems, e.g. if the file is renamed on file system by error but not in database. That also allows to run some integrity check periodically, to verify nothing was screwed.

If this approach is not sufficient (e.g. you want it to be more robust), I see no other way than to store the binary in the database in LOB. Then it will be really transactional and safe.

Corticate answered 26/3, 2010 at 16:37 Comment(0)
E
1

An old question I know, but for the benefit of other readers:

Depending on your operating systems you may be able to use Transactional TxF

http://msdn.microsoft.com/en-us/magazine/cc163388.aspx

Elocution answered 13/6, 2011 at 1:41 Comment(1)
Note that Microsoft now plans to deprecate TxF at some point in the future. They recommend using alternative approaches such as SQL FILESTREAMs.Bechuanaland
P
0

Treat the two events (managing the reference, and managing the file) as a single transaction. If either one fails, back the other one out. Then you should find it hard to get into a situation where the two are not in sync. It's easier to rollback database operations than filesystem operations.

Postremogeniture answered 26/3, 2010 at 16:31 Comment(1)
But what if the rollback fails (if you're deleting the file as a rollback)? Or what if the transaction commit fails (if you're using a DB transaction and commit after the file is created)?Endarch
B
0

FILESTREAM was introduced in SQL Server 2008 to address this exact problem.

However, it comes with its own set of implementation challenges.

Bechuanaland answered 18/4, 2014 at 16:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.