Safe to use unsafeIOToSTM to read from database?
Asked Answered
I

1

6

In this pseudocode block:

atomically $ do
  if valueInLocalStorage key
      then readValueFromLocalStorage key
      else do
        value <- unsafeIOToSTM $ fetchValueFromDatabase key
        writeValueToLocalStorage key value

Is it safe to use unsafeIOToSTM? The docs say:

  • The STM implementation will often run transactions multiple times, so you need to be prepared for this if your IO has any side effects.

    Basically, if a transaction fails it is because some other thread wroteValueToLocalStorage and when the transaction is retried it will return the stored value instead of fetching from the database again.

  • The STM implementation will abort transactions that are known to be invalid and need to be restarted. This may happen in the middle of unsafeIOToSTM, so make sure you don't acquire any resources that need releasing (exception handlers are ignored when aborting the transaction). That includes doing any IO using Handles, for example. Getting this wrong will probably lead to random deadlocks.

    This worries me the most. Logically, if fetchValueFromDatabase doesn't open a new connection (i.e. an existing connection is used) everything should be fine. Are there other pitfalls I am missing?

  • The transaction may have seen an inconsistent view of memory when the IO runs. Invariants that you expect to be true throughout your program may not be true inside a transaction, due to the way transactions are implemented. Normally this wouldn't be visible to the programmer, but using unsafeIOToSTM can expose it.

    key is a single value, no invariants to break.

Innumerable answered 7/12, 2015 at 10:1 Comment(10)
The second item is very scary indeed.Papacy
Can I ask, why do you want to do this within STM?Claussen
@MathematicalOrchid: I have a local store/cache in a TVar. Whenever something is not in the store it should be fetched from the database and saved in the store. I could of course (and currently am) split this in 2 steps: STM (check for value in store) -> IO (fetch from db) -> STM (save in store), but this would mean that another thread would be able to change the store in the window between the two STM transactions.Innumerable
@PhilipKamenarsky So change it so that the first transaction is "look in the cache, if it's not there, put an [I'm fetching it] marker there". Any other transactions trying to fetch the same value retry when they see the marker. Problem solved.Claussen
@Claussen Just as you wrote this it occurred to me. Put it in an answer so I can accept it :)Innumerable
@Claussen that's a more complicated and also fragile solution. For instance what if the original thread never returns to replace the marker?Meteorite
@PhilipKamenarsky I think you might have a deeper issue with your proposed solution: if thread B alters the store in your TVar (even a different key/val) while thread A is doing its DB read, thread A's transaction will be restarted. There's a real danger of livelock, since your database reads live on a much slower timescale than local concurrency stuff.Meteorite
@Meteorite You make some valid points. I would expect the flag / unflag stuff with exception-handling bracketing for this reason. If you do two STM transactions, a momentary write conflict shouldn't matter; the transaction to commit the write will just immediately retry.Claussen
Presumably fetchValueFromDatabase will retry in the middle of the operation at least sometimes, meaning any temporary buffers or memory that has been allocated (which is almost a certainty for network IO) will not be cleaned up - at best, you will leak memory everywhere, at worst, these resources not being cleaned up will be observable elsewhere (like reading constantly changing, garbage data from some handle).Burson
@Meteorite I'm using a fetch flag, putting a timeout on the database read and clearing the flag when an exception occurs. That should cover it, do you see anything else that might go wrong?Innumerable
C
3

I would suggest that doing I/O from an STM transaction is just a bad idea.

Presumably what you want is to avoid two threads doing the DB lookup at the same time. What I would do is this:

  • See if the item is already in the cache. If it is, we're done.

  • If it isn't, mark it with an "I'm fetching this" flag, commit the STM transaction, go get it from the DB, and do a second STM transaction to insert it into the cache (and remove the flag).

  • If the item is already flagged, retry the transaction. This blocks the calling thread until the first thread inserts the value from the DB.

Claussen answered 7/12, 2015 at 14:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.