MarkLogic Java API deadlock detection
Asked Answered
S

1

7

One of our application just suffered from some nasty deadlocks. I had quite a hard time recreating the problem because the deadlock (or stacktrace) did not show up immediately in my java application logs.

To my surprise the marklogic java api retries failing requests (e.g because of a deadlock). This might make sense, if your request is not a multi statement request, but otherwise i'm not sure if it does.

So lets stick with this deadlock problem. I created a simple code snippet in which i create a deadlock on purpose. The snippet creates a document test.xml and then tries to read and write from two different transactions, each on a new thread.

public static void main(String[] args) throws Exception {
        final Logger root = (Logger) LoggerFactory.getLogger(Logger.ROOT_LOGGER_NAME);
        final Logger ok = (Logger) LoggerFactory.getLogger(OkHttpServices.class);
        root.setLevel(Level.ALL);
        ok.setLevel(Level.ALL);

        final DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, new DatabaseClientFactory.DigestAuthContext("username", "password"));

        final StringHandle handle = new StringHandle("<doc><name>Test</name></doc>")
            .withFormat(Format.XML);
        client.newTextDocumentManager().write("test.xml", handle);

        root.info("t1: opening");
        final Transaction t1 = client.openTransaction();
        root.info("t1: reading");
        client.newXMLDocumentManager()
            .read("test.xml", new StringHandle(), t1);

        root.info("t2: opening");
        final Transaction t2 = client.openTransaction();
        root.info("t2: reading");
        client.newXMLDocumentManager()
            .read("test.xml", new StringHandle(), t2);

        new Thread(() -> {
            root.info("t1: writing");
            client.newXMLDocumentManager().write("test.xml", new StringHandle("<doc><t>t1</t></doc>").withFormat(Format.XML), t1);
            t1.commit();
        }).start();

        new Thread(() -> {
            root.info("t2: writing");
            client.newXMLDocumentManager().write("test.xml", new StringHandle("<doc><t>t2</t></doc>").withFormat(Format.XML), t2);
            t2.commit();
        }).start();

        TimeUnit.MINUTES.sleep(5);

        client.release();
    }

This code will produce the following log:

14:12:27.437 [main] DEBUG c.m.client.impl.OkHttpServices - Connecting to localhost at 8000 as admin
14:12:27.570 [main] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction null
14:12:27.608 [main] INFO  ROOT - t1: opening
14:12:27.609 [main] DEBUG c.m.client.impl.OkHttpServices - Opening transaction
14:12:27.962 [main] INFO  ROOT - t1: reading
14:12:27.963 [main] DEBUG c.m.client.impl.OkHttpServices - Getting test.xml in transaction 5298588351036278526
14:12:28.283 [main] INFO  ROOT - t2: opening
14:12:28.283 [main] DEBUG c.m.client.impl.OkHttpServices - Opening transaction
14:12:28.286 [main] INFO  ROOT - t2: reading
14:12:28.286 [main] DEBUG c.m.client.impl.OkHttpServices - Getting test.xml in transaction 8819382734425123844
14:12:28.289 [Thread-1] INFO  ROOT - t1: writing
14:12:28.289 [Thread-1] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction 5298588351036278526
14:12:28.289 [Thread-2] INFO  ROOT - t2: writing
14:12:28.290 [Thread-2] DEBUG c.m.client.impl.OkHttpServices - Sending test.xml document in transaction 8819382734425123844

Neither t1 or t2 will get commited. MarkLogic logs confirm that there actually is a deadlock:

==> /var/opt/MarkLogic/Logs/8000_AccessLog.txt <==
127.0.0.1 - admin [24/Nov/2018:14:12:30 +0000] "PUT /v1/documents?txid=5298588351036278526&category=content&uri=test.xml HTTP/1.1" 503 1034 - "okhttp/3.9.0"

==> /var/opt/MarkLogic/Logs/ErrorLog.txt <==
2018-11-24 14:12:30.719 Info: Deadlock detected locking Documents test.xml

This would not be a problem, if one of the requests would fail and throw an exception, but this is not the case. MarkLogic Java Api retries every request up to 120 seconds and one of the updates timeouts after like 120 seconds or so:

Exception in thread "Thread-1" com.marklogic.client.FailedRequestException: Service unavailable and maximum retry period elapsed: 121 seconds after 65 retries
    at com.marklogic.client.impl.OkHttpServices.putPostDocumentImpl(OkHttpServices.java:1422)
    at com.marklogic.client.impl.OkHttpServices.putDocument(OkHttpServices.java:1256)
    at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:920)
    at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:758)
    at com.marklogic.client.impl.DocumentManagerImpl.write(DocumentManagerImpl.java:717)
    at Scratch.lambda$main$0(scratch.java:40)
    at java.lang.Thread.run(Thread.java:748)

What are possible ways to overcome this problem? One way might be to set a maximum time to live for a transaction (like 5 seconds), but this feels hacky and unreliable. Any other ideas? Are there any other settings i should check out?

I'm on MarkLogic 9.0-7.2 and using marklogic-client-api:4.0.3.

Edit: One way to solve the deadlock would be by syncronizing the calling function, this is actually the way i solved it in my case (see comments). But i think the underlying problem still exists. Having a deadlock in a multi statement transaction should not be hidden away in a 120 second timeout. I rather have a immediately failing request than a 120 second lock on one of my documents + 64 failing retries per thread.

Saad answered 24/11, 2018 at 13:23 Comment(4)
Will a deadlock is a problem you'll have to overcome conceptually, tools will only help you that far. Creating a lock around the critical section is usually a simple approach.Lodmilla
Having a lock (in my java application) is how i actually solved it, still i think having a deadlocked transaction retrying a request for 120sec by default is kinda rude. Shouldn't a unresolvable deadlock rather throw an error? Someone might see this as a bug/feature request for marklogic-client-api.Saad
you can refer #1102859 ..Bice
@secretsuperstar My question is not about a java deadlock, but a deadlock in MarkLogic. But thank you for your comment nonetheless!Saad
B
3

Deadlocks are usually resolvable by retrying. Internally, the server does a inner-retry loop because usually deadlocks are transient and incidental, lasting a very short time. In your case you have constructed a case that will never succeed with any timeout that's equal for both threads. Deadlocks can be avoided at the application layer by avoiding multi-statement transactions when using the REST API. (which is what the Java api uses). Multi statement transactions over REST cannot be implemented 100% safely due to the client's responsibility to manage the transaction ID and the server's inability to detect client-side errors or client-side identity. Very subtle problems can and do occur unless you are aggressively proactive wrt handling errors and multithreading. If you 'push' the logic to the server (xquery or javascript) the server is able to manage things much better.

As for if its 'good' or not for the Java API to implement retries for this case, that's debatable either way. (The compromise for an seemingly easy-to-use interface is that many things that would otherwise be options are decided for you as a convention. There's generally no one-size-fits-all answer. In this case I am presuming the thought was that a deadlock is more likely caused by independant code/logic by 'accident' as opposed to identical code running in tangent -- a retry in that case would be a good choice. In your example its not, but then an earlier error would still fail predictably until you change your code to 'not do that' ).

If it doesn't already exist, a feature request for a configurable timeout and retry behaviour does seem a reasonable request. I would recommend, however, to attempt to avoid any REST calls that result in an open transaction -- inherently that is problematic, particularly if you don't notice the problem upfront (then its more likely to bite you in production). Unlike JDBC, which keeps a connection open so that the server can detect client disconnects, HTTP and the ML Rest API do not -- which leads to a different programming model then traditional database coding in java.

Bursarial answered 25/11, 2018 at 4:17 Comment(9)
Thanks for this detailed answer. To add a little bit more context on why we are using ALOT of multi-statement transactions in our app: This app is a java Spring Framework application which thereby used the typical @Transactional model (so client side transaction model). We switched from sql to marklogic, but obviously still have thousands of lines of client side code which expect this type of model. So we have multi-statement transactions all over the place.Saad
Sql rolled a transaction back, if there was a deadlock, which is not the case anymore now. This makes transitioning and migrating from sql to marklogic quite a bit harder :(. Apart from that i really like your answer. It explains the reasoning behind why the current behaviour is like that and it does make sense. You are right, this deadlocking code should be ported into marklogic for best transaction handling.Saad
Don't you think there is a way for marklogic to tell the client one of its multi-statement transactions is part of a deadlock? This way the client could throw an error for this transaction and roll it back accordingly (instead of retrying). It already does tell the client that it cannot execute that request, why don't just add "because of a deadlock".Saad
The retry for service unavailable in the Java API also handles High Availability scenarios such as forest failovers. You could file an RFE on the Java API in GitHub to support configuration of retry for the deadlock case.Cytogenetics
Link to the rfe: github.com/marklogic/java-client-api/issues/1038Saad
re: "Sql rolled a transaction back, if there was a deadlock, which is not the case anymore now" -- I did not interpret this from the post. An error in the REST API 'should' rollback any pending open transaction ( a transaction that was not already open at the time of the request). I say 'should' because I am not 100% sure that it does and cannot think offhand how it could possibly not. Are you saying that new transactions are left open even if they deadlock? Or are you saying that on a 2nd+ API call an existing transaction is not rolled back? The later makes sense.Bursarial
According to this: help.marklogic.com/Knowledgebase/Article/View/17/0/… I believe you are seeing a recoverable deadlock, in which case the REST API will not report an error to the client, but it will be poorly performing. See developer.marklogic.com/blog/resolving-unresolvable-deadlocks for suggestions on fixing that. You can validate this assumption by enabling DEBUG level logging, as per the first linkBursarial
Thank you for further answering this topic, DALDEI. Very much appreciated! But i am not sure if we are on the same train here :) My problem is not that i have a deadlock and don't know how to resolve it. Its more that i have a application which might have deadlocks and fear the consequences. The consequences here are a 120 second blocking transaction. I updated my github rfe and added better logging output. I hope this helps to better understand the issue. Also note because i have already resolved this question as marked, i do not get notified about comments anymore! (will check regulary)Saad
Good luck. Be careful what you ask for. Many Deadlock and 'deadlock like' situations are difficult to distinguish from transient load -- until some time has elapsed. At which point premature termination can actually lead to a degenerate condition or unnecessary failures. You may find a better way of handling this is at a higher level in the application. Consider reactive design patterns. The low level code often does not have enough context to make a good choice between trying hard to fulfill its request vs risk of failing too soon.Bursarial

© 2022 - 2024 — McMap. All rights reserved.