Never ending periodic recovery of heuristic participants
Asked Answered
E

3

10

For days our log has been full of this message

2018-06-15 12:19:23 WARN [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c has 1 heuristic participant(s)!
2018-06-15 12:19:23 WARN [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=46, bqual_length=36, tx_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c, node_name=acme_node, branch_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08d, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6569a57c >
2018-06-15 12:19:23 WARN [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c restored heuristic participant XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=46, bqual_length=36, tx_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08c, node_name=acme_node, branch_uid=0:ffff0a983f1e:1f3aa2ff:5a09aa02:d1c08d, subordinatenodename=null, eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6569a57c >

It is always the same Xid. Is there a way to solve this? We are considering gracefully shutting down the application and deleting the files in data/tx-object-store. Is this a good idea?

That's with WildFly 11. We have XA transactions set up with Oracle 12c and IBM WebSphere MQ. We are doing XA transactions from a message driven bean to JDBC.

Elishaelision answered 16/6, 2018 at 6:58 Comment(4)
just a question: do you need the default recovery that wildfly does?Lecythus
@Lecythus What do you mean with the default recovery?Elishaelision
Wildfly runs a periodic recovery every - at least at our server - 2 minutes and 11 seconds. We had to disable the xa recovery by using <recovery no-recovery="true"> in our xa-datasource configuration.Lecythus
I believe in general we need the recovery, as I understand the recovery is required for consistency across resources. In this specific case we do not need recovery. No essential information was lost.Elishaelision
E
2

I found the answer to the problem in 2.4.1. Assumed complete of the transaction guide.

If a failure occurs in the transaction environment after the transaction coordinator had told the XAResource to commit but before the transaction log has been updated to remove the participant, then recovery will attempt to replay the commit. In the case of a Serialized XAResource, the response from the XAResource will enable the participant to be removed from the log, which will eventually be deleted when all participants have been committed. However, if the XAResource is not recoverable then it is extremely unlikely that any XAResourceRecovery instance will be able to provide the recovery sub-system with a fresh XAResource to use in order to attempt recovery; in which case recovery will continually fail and the log entry will never be removed.

There are two possible solutions to this problem:

Rely on the relevant ExpiryScanner to eventually move the log elsewhere. Manual intervention will then be needed to ensure the log can be safely deleted. If a log entry is moved, suitable warning messages will be output.

Set the com.arjuna.ats.jta.xaAssumeRecoveryComplete to true. This option is checked whenever a new XAResource instance cannot be located from any registered XAResourceRecovery instance. If false (the default), recovery assumes that there is a transient problem with the XAResourceRecovery instances (e.g., not all have been registered with the sub-system) and will attempt recovery periodically. If true then recovery assumes that a previous commit attempt succeeded and this instance can be removed from the log with no further recovery attempts. This option is global, so needs to be used with care since if used incorrectly XAResource instances may remain in an uncommitted state.

Elishaelision answered 21/6, 2018 at 13:8 Comment(0)
H
2

There is a db transaction that didn't complete and your server is trying to recover. Check insde

SERVER_HOME/standalone/data/tx-object-store/ShadowNoFileLockStore/defaultStore/StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction/

There are transaction files. Maybe first delete the transaction files (On your local/dev env preferably) and tail your logs to identify the transaction/s that didn't complete/commit. Fix the root of the problem and the warnings will go away. Alternatively check the jndiName on the WARNING for an idea which data-source is creating those warnings.

The warnings are "never ending" because you probably have a scheduled task that keeps on attempting (on your specified interval) to talk to your db but transactions are never completed due to the underlying error that you must first fix.

Hello answered 25/9, 2019 at 15:9 Comment(0)
P
2

As previous posters said, the messages are never ending. I inherited a system and found these messages have been filling our logs for 3 years. I got clues from:

https://knowledge.broadcom.com/external/article/129101/arjuna016037-could-not-find-new-xaresour.htm

and

https://docs.wildfly.org/13/Admin_Guide.html#Command_Line_Interface

My warning messages were very similar to those listed at the top of this page. To fix:

  1. Connect to jboss-cli
  2. delete the transaction causing the warning
  3. delete the files in the atomicaction directory.

.\jboss-cli.ps1 --connect --controller=localhost

in jboss (I am using windows so I had to put a backslash in front of the colon) /subsystem=transactions/log-store=log-store/transactions=0\:ffffac100086\:781344c7\:61b922df\: 5a66f4\:eb:delete()

{
    "outcome" => "failed",
    "failure-description" => "WFLYCTL0030: No resource definition is registered for address [
    (\"subsystem\" => \"transactions\"),
    (\"log-store\" => \"log-store\"),
    (\"transactions\" => \"0:ffffac100086:781344c7:61b922df:5a66f4:eb\")
]",
    "rolled-back" => true
}

If everything worked - you should see rolled-back => true. Now delete the files in the atomicaction directory. The file should be the same as the transaction but has _ instead of :

0_ffffac100086_781344c7_61b922df_5a66f4

Palladian answered 18/10, 2022 at 18:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.