We have been profiling and profiling our application to get reduce latency as much as possible. Our application consists of 3 separate Java processes, all running on the same server, which are passing messages to each other over TCP/IP sockets.
We have reduced processing time in first component to 25 μs, but we see that the TCP/IP socket write (on localhost) to the next component invariably takes about 50 μs. We see one other anomalous behavior, in that the component which is accepting the connection can write faster (i.e. < 50 μs). Right now, all the components are running < 100 μs with the exception of the socket communications.
Not being a TCP/IP expert, I don't know what could be done to speed this up. Would Unix Domain Sockets be faster? MemoryMappedFiles? what other mechanisms could possibly be a faster way to pass the data from one Java Process to another?
UPDATE 6/21/2011 We created 2 benchmark applications, one in Java and one in C++ to benchmark TCP/IP more tightly and to compare. The Java app used NIO (blocking mode), and the C++ used Boost ASIO tcp library. The results were more or less equivalent, with the C++ app about 4 μs faster than Java (but in one of the tests Java beat C++). Also, both versions showed a lot of variability in the time per message.
I think we are agreeing with the basic conclusion that a shared memory implementation is going to be the fastest. (Although we would also like to evaluate the Informatica product, provided it fits the budget.)