How to isolate user sessions in a Java EE?

Asked 9/3, 2011 at 20:14 Answered 23/11, 2011 at 14:39

Solved java session jakarta-ee isolation

We are considering development of a mission critical application in Java EE, and one thing that really impressed me is the lack of session isolation in the platform. Let me explain the scenario.

We have a native Windows application (a complete ERP solution) that receives about 2k LoC and 50 bug-fixes per month from sparse contributors. It also supports scripting, so the costumer can add their own logic and we have no clue about what such logic does. Instead of using a thread pool, each server node has a broker and a process pool. The broker receives a client request, enqueues it until a pooled instance is free, sends request to that instance, delivers response to client, and releases the instance back to the process pool.

This architecture is robust because with so many sparse contributions and custom scripting, it's not uncommon for a deployed version to have some serious bug such as an infinite loop, a long-waiting pessimistic lock, a memory corruption or memory leakage. We implemented a memory limit, a timeout for requests, and a simple watchdog. Whenever some process fails to answer correctly and on time, the broker simply kills it, so the watchdog detects and starts another instance. If a process crashes before it started to answer a request, the broker sends the same request to another pooled instance, and the user doesn't know about any failure on the server side (except in admin logs). This is nice because some instances are slowly trashed by bogus code as they work on requests. Because most session data is held at the client or (in rare cases) at a shared storage, it seems to work perfectly.

Now considering a move to Java EE, I couldn't find anything similar on the spec or popular application servers such as Glassfish and JBoss. Yes, I know that most cluster implementations do transparent fail-over with session replication, but we have small companies that use our system on a simple 2-node cluster (and we also have adventurers that use the system on a 1-node server). With a thread pool, I understand that a buggy thread can bring an entire node down, because the server cannot detect and safely kill it. Bringing an entire node down is much worst than killing a single process - we have deployments where each node has about 100 pooled process instances.

I know that IBM and SAP are aware of this problem, based on

, respectively. But based on recent JSRs, forums and open-source tools, there isn't much activity on the community.

Now comes the questions!

If you have a similar scenario and use Java EE, how did you solve?
Do you know about an upcoming open-source product or change in Java EE spec that can address this issue?
Does .NET have the same problem? Can you explain or cite references?
Do you know about some modern and open platform that can address this issue and is worth the task doing ERP business logic?

Please, I have to ask you not tell about making more testing or any kind of QA investment, because we cannot force our costumers to make this on their own scripts. We also have cases where urgent bug-fixes must bypass QA, and while we force the customer to accept this, we cannot make him accept that a buggy software part can affect a range of unrelated features. This is issue is about robust architectures, not development process.

Thanks for your attention!

Impressible answered 9/3, 2011 at 20:14 Comment(1)

UPDATE: It seems like this will be solved in Java 8: pcworld.com/businesscenter/article/237337/… – Impressible 9/8, 2011 at 1:15

What you have stumbled upon is a fundamental issue regarding the use of Java and "hostile" applications.

It's a fundamental issue not just at the Java EE level, but at the core JVM level. The typical JVMs available have all sorts of issues with loading "unsafe code". From memory leaks, class loader leaks, resource exhaustion, and unclean thread kills, the typical JVM is simply not robust enough to handle badly behaving code well in a shared environment.

A simple example is memory exhaustion of the Java heap. As a basic rule, NOBODY (and by nobody, I specifically mean the core java library and just about every other 3rd party library out there) catches OutOfMemory exceptions. There are the rare few who do, but even they can do little about it. Typical code handles the exceptions they "expect" to handle, but let others fall through. Runtime exceptions (of which OOM is one) will happily bubble up through the call stack all the way to the top, leaving behind a wreckage of unchecked critical path code, leaving all sort of things in unknown state.

Things such as Constructors or static initializers which "can't fail" leaving behind uninitialized class members which are "never null". These damaged classes simply don't know they're damaged. Nobody knows they're damaged, and there's no way to clean them up. A Heap that hits OOM is an unsafe image and pretty much needs to be restarted (unless, of course, you wrote or audited ALL of the code yourself, which, naturally, you won't -- who would?).

Now, there may well be vendor specific JVMs which are better behaved and give you better control. The ones based on the Sun/Oracle JVM (i.e. most of them) do not.

So, it's not necessarily a Java EE issue, it's a JVM issue.

Hosting hostile code in the JVM is a bad idea. The only way it's practical is if you host a scripting language, and that scripting language implements some kind of resource control. That could be done, and you can tweak the existing ones as a start (JavaScript, Groovy, JPython, JRuby). The fact that these languages give users direct access to Java libraries makes them potentially dangerous, so you may have to restrict that as well to only aspects wrapped by script handlers. At this point, though, the "why use Java at all" question floats up.

You'll note Google App Engine does none of these. It spools up a separate JVM for each application that's being run, but even then it greatly restricts what can be done within those JVMs, notably through the existing Java security model. The distinction here is that these instances tend to be "long lived" so as not to endure the processing costs of startup and shutdown. I should say, they SHOULD be long lived, and those that are not do incur those costs.

You can make several instances of the JVM yourself, give them a bit of infrastructure to handle requests for logic, give them custom class loader logic to try and protect from class loader leaks, and minimally let you kill the instances off (they're simply a process) if you want. That can work, and probably work "ok" depending on the granularity of the calls, and the "start up" time for your logic. The start up time will minimally be the loading of the classes for the logic from run to run, that alone may make this a bad idea. And it certainly WON'T be "Java EE". Java EE is not set up to do this kind of thing. But you're not clear what Java EE features you're looking at either.

Effectively, this is what Apache and "mod_php" does. Several instances, as processes, individually handling requests, with badly behaving once being killed off as necessary. This is why PHP is common in the shared hosting business. In this structure, it's basically "safe".

Sunbreak answered 9/3, 2011 at 20:56 Comment(4)

JVM is just a common environment for running generic applications, including the ones that must be robust and scalable, but also allowing simpler, occasional applications. If you use other generic languages such as C++ and Python, all things you mentioned about hostile code still applies. I think it's a problem of Java EE, that has a beatifull speech about robustness, performance, and support for big applications and development teams, but is actually unable to accomplish. Of course, Java EE can deppend on JVM to do some tasks, but it could also implement itself, like it does with servlets. – Impressible 9/3, 2011 at 23:39

As a rule, most C++ and Python programs are not long running processes allowing the loading of arbitrary code. They are not "containers" per se in the sense the the JVM and JEE containers are. Most other languages are hosted by the OS and rely upon it services directly via processes, which are individual execution environments, each isolated and with their own resource limits. The common JVMs, and the JVM spec, offers nothing similar to the concept of Processes. There's nothing JEE can do to prevent a WAR from doing "new byte[1000000000]" until the heap is exhausted, destroying the JVM. – Sunbreak 10/3, 2011 at 0:11

I agree with you, except for a bit. What makes JVM a long running process is not the JVM itself but JEE. JEE could start multiple child JVMs with limited memory just to run application code. This would not prevent a WAR from killing a single JVM, but would prevent a WAR from killing the whole server (and that's what really matters). Of course, there are some issues like shared data and JVM footprint that must solved for this to work, and perhaps that would require changes in JVM and/or JEE spec. – Impressible 10/3, 2011 at 1:3

Of course, but as of its current state, the JVM is pretty well known for its slow startup and heavy memory burden. Its actually a pretty lousy citizen on Unix systems when it comes to sharing resources. It also runs better when it's been running for some time. So, while, certainly, what you suggest can be done (it's what I mentioned toward the end), the longer running it is the better. And the environments built around Java and worked around these characteristics of the runtime to where most systems work within their own, long running environment (JEE servers, OSGI servers, Spring, etc.) – Sunbreak 10/3, 2011 at 5:37

I believe your scenario is highly untypical, thus it is improbable that there is a ready made framework/platform addressing this need. Java EE sort of assumes that the request processing code is written by the same team as the rest of the app, thus it need not be isolated, watched and reset that often, and bug fixes would be handled the same way in all parts of the system. This assumption greatly simplifies development, deployment, testing etc. for most of the projects, not forcing them to pay for something they don't need, And yes, it isn't suitable for everyone. If you want something fundamentally different, you probably need to implement a fair amount of failover logic yourself. Java EE does provide the fundamental building blocks for this though.

I believe (although have no concrete experience to prove it) that .NET or other platforms are basically built on similar assumptions.

Striking answered 9/3, 2011 at 20:23 Comment(0)

We had a similar - though not so severe - port of a really enormous Perl site to Java. On receiving an HTTP request we instantiate a class and call its processRequest method. surrounded by try-catch and time measurement. Adding a timer and thread would suffice to be able to kill the thread. This probably is sufficient in real life.

A Java EE server like glassfish is an OSGi container you might have more isolating means.

Also you could run an array of (web or local) applications on which you dispatch your request via a central web applications. Those applications then are isolated.

Even more isolated are serialized sessions and operating system processes starting a new JVM.

Connubial answered 23/11, 2011 at 14:39 Comment(0)

Recommended topics

Hot tags