How do distributed transactions work (eg. MSDTC)?

I understand, in a fuzzy sort of way, how regular ACID transactions work. You perform some work on a database in such a way that the work is not confirmed until some kind of commit flag is set. The commit part is based on some underlying assumption (like a single disk block write is atomic). In the event of a catastrophic error, you can just clear out the uncommitted data in the recovery phase.

How do distributed transactions work? In some of the MS documentation I have read that you can somehow perform a transaction across databases and filesystems (among other things).

This technology could be (and probably is) used for installers, where you want the program to be fully installed or fully absent. You simply begin a transaction at the start of the installer. Next you could connect to the registry and filesystem, making the changes that define the installation. When the job is done, simply commit, or rollback if the installation fails for some reason. The registry and filesystem are automatically cleaned for you by this magical distributed transaction coordinator.

How is it possible that two disparate systems can be transacted upon in this fashion? It seems to me that it is always possible to leave the system in an inconsistent state, where the filesystem has committed its changes and the registry has not. I think in MSDTC it is even possible to perform a transaction across the network.

I have read http://blogs.msdn.com/florinlazar/archive/2004/03/04/84199.aspx, but it feels like only the beginning of the explanation, and that step 4 should be expanded considerably.

Edit: From what I gather on http://en.wikipedia.org/wiki/Distributed_transaction, it can be accomplished by a two-phase commit (http://en.wikipedia.org/wiki/Two-phase_commit). After reading this, I'm still not understanding the method 100%, it seems like there is a lot of room for error between the steps.

About "step 4":

The transaction manager coordinates with the resource managers to ensure that all succeed to do the requested work or none of the work if done, thus maintaining the ACID properties.

This of course requires all participants to provide the proper interfaces and (error-free) implementations. The interface looks like vaguely this:

public interface ITransactionParticipant {
    bool WouldCommitWork();
    void Commit();
    void Rollback();
}

The Transaction manager at commit-time queries all participants whether they are willing to commit the transaction. The participants may only assert this if they are able to commit this transaction under all allowable error conditions (validation, system errors, etc). After all participants have asserted the ability to commit the transaction, the manager sends the Commit() message to all participants. If any participant instead raises an error or times out, the whole transaction aborts and individual members are rolled back.

This protocol requires participants to have recorded their whole transaction content before asserting their ability to commit. Of course this has to be in a special local transaction log structure to be able to recover from various kinds of failures.

Recommended topics

Hot tags