Cannot recover from org.omg.CORBA.TRANSIENT (becomes permanent)
Asked Answered
F

2

9

I want to ensure my CORBA client is resilient to outages, I have the client working and am testing resilience by disabling by network adapter in Windows. The CORBA connection obviously fails, and the functionality is unavailable, but then it does not recover when the adapter is enabled again. ORB.init is called again but I continue to get the same errors.

It seems as if after org.omg.CORBA.TRANSIENT is thrown some static state is kept which causes the client to report a network connection timeout even when the problem is fully resolved. Only restarting the process (a dropwizard runnable JAR) will allow the client to work again.

This is the code that starts up the ORB:

String[] orbInits = {"-ORBInitRef", orbInitRef};
Properties properties = new Properties() {
    {
        setProperty("org.omg.CORBA.ORBClass", orbClass);
        setProperty("org.omg.CORBA.ORBSingletonClass", orbSingletonClass);
        setProperty("jacorb.connection.client.connect_timeout", "" + connectionTimeout);
    }
};
return ORB.init(orbInits, properties);

The problem persists even if the app is calling ORB.init upon each attempt to perform the operation (i.e. with ORB pooling switched off).

Errors thrown by the client in the outage scenario include:

org.omg.CORBA.TIMEOUT: connection timeout of 2000 milliseconds expired
org.omg.CORBA.TRANSIENT: Retries exceeded, couldn't reconnect to <IP>:<PORT>

In at least one (possibly all) cases there was no org.omg.CORBA.TIMEOUT before org.omg.CORBA.TRANSIENT became permanent (i.e. TIMEOUT may be log noise).

Obviously becuase the client is also a server we'd prefer to not have to restart it after every outage (and they do happen, especially in the dev environment).

The implementation is JACORB (org.jacorb.orb.ORB / org.jacorb.orb.ORBSingleton) version 2.2.4.

Foreskin answered 29/1, 2014 at 16:52 Comment(4)
The latest version of Jacorb (3.3) does not have this issue.... but we're not supposed to use it with the server.Foreskin
Is it an issue of JacORB 2.2.4? Can you link it?Arlinearlington
Google is not helpful in finding any specifics.Foreskin
What would I do with it, once I'd caught it?Foreskin
J
0
  1. Catch the CORBA exceptions and analyze (Timed out? Unknown...?)
  2. Retry later or never (but do not die)

Hint: It is common consensus to catch CORBA exception in an application. Especially all around networking calls.

Windows is removing the network interface when the network cable is unplugged. You! have to recover from those outages. Removing a network cable is different from a general network outage like not connectivity on layer 3!

Jaymejaymee answered 7/2, 2014 at 12:43 Comment(3)
I obviously catch CORBA exceptions and deal with them. The question is why does ORB.init always return something that throws a TRANSIENT even though I put the network cable back in (and called ORB.init again).Foreskin
In your code example there is no connection. Only ORB::init() - your init refs are not shown! if orbInitRef is a corbaloc url to a remote host or to network, this may fail.Jaymejaymee
You mean I may have the wrong URL? ... but it works 99.9999% of the time.Foreskin
F
0

It appears, somewhat embarrassingly, that this was really a problem with the pool, documented here: https://github.com/chrisvest/stormpot/issues/72

I thought I had experimentally verified that pooling was not an issue before posting the question and omitted to mention it for brevity. It would be interesting though, to know why the question got 7 upvotes? Perhaps there is a another cause?

Foreskin answered 28/4, 2014 at 10:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.