How to decide what exceptions are worth retrying when reading and writing to MongoDB (C# driver)?
Asked Answered
P

1

6

By looking at this official documentation it seems that there are basically three types of errors thrown by the MongoDB C# driver:

  • errors thrown when the driver is not able to properly select or connect to a Server to issue the query against. These errors lead to a TimeoutException
  • errors thrown when the driver has successfully selected a Server to run the query against, but the server goes down while the query is being executed. These errors manifest themselves as MongoConnectionException
  • errors thrown during a write operations. These errors leads to MongoWriteException or MongoBulkWriteException depending on the type of write operation being performed.

I'm trying to make my software using MongoDB a bit more resilient to transient errors, so I want to find which exceptions are worth retry.

The problem is not implementing a solid retry policy (I usually employ Polly .NET for that), but instead understanding when the retry makes sense.

I think that retrying on exceptions of type TimeoutException doesn't make sense, because the driver itself waits for a few seconds before timing out an operation (the default is 30 seconds, but you can change that via the connection string options). The idea is that retry the operation after you have waited for 30 seconds before timing out is probably a waste of time. For instance if you decide to implement 3 retries with 1 second of waiting time between them, it takes up to 93 seconds to fail an operation (30 + 30 + 30 + 1 + 1 + 1). This is a huge time.

As documented here retrying on MongoConnectionException is only safe when doing idempotent operations. From my point of view, it makes sense to always retry on these kind of errors provided that the performed operation is idempotent.

The hard bit in deciding a good retry strategy for writes is when you get an exception of type MongoWriteException or MongoBulkWriteException.

Regarding the exceptions of type MongoWriteException is probably worth retrying all the exceptions having a ServerErrorCategory other than DuplicateKey. As documented here you can detect the duplicate key errors by using this property of the MongoWriteException.WriteError object.

Retrying duplicate key errors probably doesn't make sense because you will get them again (that's not a transient error).

I have no idea how to handle errors of type MongoBulkWriteException safely. In that case you are inserting multiple documents to MongoDB and it is entirely possible that only some of them have failed, while the others have been successfully written to MongoDB. So retrying the exact same bulk insert operation could lead to write the same document twice (bulk writes are not idempotent in nature). How can I handle this scenario ?

Do you have any suggestion ?

Do you know any working example or reference regarding retrying queries on MongoDB for the C# driver ?

Prophesy answered 30/8, 2019 at 16:49 Comment(8)
None!!! Item 1 : Just increase the connection time out. 2) If the server goes down in the middle of an Insert you will get duplicate entries in the database 3) Again if you are writing more than one row you will end up with duplicates.Retrace
@Retrace I'm not sure that never retrying an operation is the best choice. For instance why shouldn't I retry a failed read operation when a MongoConnectionException is raised during the read ? Again, why shouldn't I retry an InsertOne operation when a MongoWriteException is raised due to a server temporary slowliness during the write operation ?Prophesy
The real question is do your want to "RETRY" when you get error and end up with duplicate entries in the database? You should always report the errors.Retrace
@Retrace do you suggest to skip retries even when reading from mongodb (in that case, there are no possibilities to corrupt the data at all) ?Prophesy
Why would reading give errors? Do you want to investigate the errors before retrying? Do you think retrying would actually work? I don't like retrying when the TCP layer already has a retry method. The chances that a retry is going to work is very small.Retrace
@Retrace I get your point of view. When you say that the TCP layer "already has a retry method" what do you exactly mean ? Are you referring to the timeout of the MongoClient when it tries to find a server able to process its request ? Or, instead, are you referring to a built in query retry mechanism implemented in the MongoDB C# driver (I don't know it's internals, maybe you do better than me)Prophesy
HTTP uses TCP as the transport layer. If you have a MongoClient, it is a web/http application. So any data transferred over the internet has a retry being done. If you are exceeding the timeout of the MongoClient query to the database a retry is going to fail most of the time and the timeout should be made larger.Retrace
@Retrace ok thanks for your comments, now I get your point. Maybe writing an organic response to my question could be helpful for general readersProphesy
A
4

Retry

Let's start with the basics of Retry.

There are situation where your requested operation relies on a resource, which might not be reachable in a certain point of time. In other words there can be a temporal issue, which will vanish sooner or later. This sort of issues can cause transient failures. With retries you can overcome these problems by attempting to redo the same operation in a specific moment in the future. To be able to use this mechanism the following criteria group should be met:

  • The potentially introduced observable impact is acceptable
  • The operation can be redone without any irreversible side effect
  • The introduced complexity is negligible compared to the promised reliability

Let’s review them one by one:

  • The word failure indicates that the effect is observable by the requester as well, for example via higher latency / reduced throughput / etc.. If the “penalty“ (delay or reduced performance) is unacceptable then retry is not an option for you.
  • This requirement is also known as idempotent operation. If I call the action with the same input several times then it will produce the exact same result. In other words, the operation acts like it only depends on its parameter and nothing else influences the result (like other objects' state).
  • This condition is even though one of the most crucial, this is the one that is almost always forgotten. As always there are trade-offs (If I introduce Z then it will increase X but it might decrease Y) and we should be fully aware of them. Unless it will give us some unwanted surprises in the least expected time.

Mongo Exception

Let's continue with exceptions that the MongoDb's C# client can throw.

I haven't used MongoDb in last couple of years so this knowledge may have been outdated. But I hope the essence did not change since.

I would also encourage you to introduce detection logic first (catch and log) before you try to mitigate the problem (for example with retry). This will give information about the frequency and amount of occurrences. It will also give you insight about the nature of the problems.

  • MongoConnectionException with a SocketException as Inner
    • When:
      • There is server selection problem
      • The connection has timed out
      • The chosen server is unavailable
    • Retry:
      • If the problem is due to network issue then it might be useful to retry
      • If the root cause is misconfiguration then retry won't help
    • Log:
  • MongoWriteException or MongoWriteConcernException
    • When:
      • There was a persistence problem
    • Retry:
      • It depends, if you perform a create operation and the server can detect duplicates (DuplicateKeyError) then it is better to try to write the record multiple times then have one failed write attempt
      • Most of the time updates are not idempotent but if you use some sort of record versioning then you can try to perform a retry and fail during the optimistic locking
      • Deletion could be implemented in an idempotent way. This is true for soft and hard delete as well.
    • Log:
Advise answered 25/2, 2021 at 8:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.