What kind of "EventBus" to use in Spring? Built-in, Reactor, Akka?
Asked Answered
A

3

46

We're going to start a new Spring 4 application in a few weeks. And we'd like to use some event-driven architecture. This year I read here and there about "Reactor" and while looking for it on the web, I stumbled upon "Akka".

So for now we have 3 choices:

I couldn't find a real comparison of those.


For now we just need something like:

  • X registers to listen for Event E
  • Y registers to listen for Event E
  • Z sends an Event E

And then X and Y will receive and handle the event.

We will most likely use this in a async way, but for sure there will be also some synchronous scenarios. And we most likely send always a class as event. (The Reactor samples mostly make use of Strings and String patterns, but it also supports Objects).


As far as I understood, ApplicationEvent works synchronous by default and Reactor works the async way. And Reactor also allows to use the await() method to make it kinda synchronous. Akka provides more or less the same as Reactor, but also supports Remoting.

Concerning Reactor's await() method: Can it wait for multiple threads to complete? Or maybe even a partial set of those threads? If we take the example from above:

  • X registers to listen for Event E
  • Y registers to listen for Event E
  • Z sends an Event E

Is it possible to make it synchronous, by saying: Wait for X and Y to complete. And is it possible to make it wait just for X, but not for Y?


Maybe there are also some alternatives? What about for example JMS?

Lot of questions, but hopefully you can provide some answers!

Thank you!


EDIT: Example use cases

  1. When a specific event gets fired, I'd like to create 10000 emails. Every email has to get generated with user specific content. So I'd create a lot of threads (max = system cpu cores) which create the mails and do not block the caller thread, 'cause this can take some minutes.

  2. When a specific event gets fired, I'd like to collect information from an unknown number of services. Each fetch takes about 100ms. Here I could imagine using Reactor's await, 'cause I need those information for continuing my work in the main thread.

  3. When a specific event gets fired, I'd like to perform some operations based on application configuration. So the application must be able to dynamically (un)register comsumers/event handlers. They'll do their own stuff with the Event and I don't care. So I would create a thread for every of those handlers and just continue doing my work in the main thread.

  4. Simple decoupling: I basically know all receivers, but I just don't want to call every receiver in my code. This should mostly get done synchronously.

Sound like I need a ThreadPool or a RingBuffer. Do those frameworks have dynamic RingBuffers, which grow in size if needed?

Assassinate answered 18/12, 2013 at 16:56 Comment(1)
Which library did you end up using?Ningsia
A
32

I'm not sure I can adequately answer your question in this small space. But I'll give it a shot! :)

Spring's ApplicationEvent system and Reactor are really quite distinct as far as functionality goes. ApplicationEvent routing is based on the type handled by the ApplicationListener. Anything more complicated than that and you'll have to implement the logic yourself (that's not necessarily a bad thing, though). Reactor, however, provides a comprehensive routing layer that is also very lightweight and completely extensible. Any similarity in function between the two ends at their ability to subscribe and publish events, which is really a feature of any event-driven system. Also don't forget the new spring-messaging module out with Spring 4. It's a subset of the tools available in Spring Integration and also provides abstractions for building around an event-driven architecture.

Reactor will help you solve a couple key problems that you would otherwise have to manage yourself:

Selector matching: Reactor does Selector matching, which encompasses a range of matches--from a simple .equals(Object other) call, to a more complex URI templating match which allows for placeholder extraction. You can also extend the built-in selectors with your own custom logic so you can use rich objects as notification keys (like domain objects, for instance).

Stream and Promise APIs: You mentioned the Promise API already with reference to the .await() method, which is really meant for existing code that expects blocking behavior. When writing new code using Reactor, it can't be stressed highly enough to use compositions and callbacks to effectively utilize system resources by not blocking threads. Blocking the caller is almost never a good idea in an architecture that depends on a small number of threads to execute a large volume of tasks. Futures are simply not cloud-scalable, which is why modern applications leverage alternative solutions.

Your application could be architected with Streams or Promises either one, though honestly, I think you'll find the Stream more flexible. The key benefit is the composability of the API, which allows you to wire actions together in a dependency chain without blocking. As a completely off-the-cuff example based on your email use-case you describe:

@Autowired
Environment env;
@Autowired
SmtpClient client;

// Using a ThreadPoolDispatcher
Deferred<DomainObject, Stream<DomainObject>> input = Streams.defer(env, THREAD_POOL);

input.compose()
  .map(new Function<DomainObject, EmailTemplate>() {
    public EmailTemplate apply(DomainObject in) {
      // generate the email
      return new EmailTemplate(in);
    }
  })
  .consume(new Consumer<EmailTemplate>() {
    public void accept(EmailTemplate email) {
      // send the email
      client.send(email);
    }
  });

// Publish input into Deferred
DomainObject obj = reader.readNext();
if(null != obj) {
  input.accept(obj);
}

Reactor also provides the Boundary which is basically a CountDownLatch for blocking on arbitrary consumers (so you don't have to construct a Promise if all you want to do is block for a Consumer completion). You could use a raw Reactor in that case and use the on() and notify() methods to trigger the service status checking.

For some things, however, it seems like what you want is a Future returned from an ExecutorService, no? Why not just keep things simple? Reactor will only be of real benefit in situations where your throughput performance and overhead effeciency is important. If you're blocking the calling thread, then you're likely going to be wiping away the effeciency gains that Reactor will give you anyway, so you might be better off in that case using a more traditional toolset.

The nice thing about the openness of Reactor is that there's nothing stopping the two from interacting. You can freely mix Futures with Consumers without static. In that case, just keep in mind that you're only ever going to be as fast as your slowest component.

Antetype answered 18/12, 2013 at 21:16 Comment(4)
Thank you :) This is a pretty awesome explanation! Just a little question: Would you also use Reactor for the kind of simple events which can be sent/consumed using ApplicationEvent (sync and async)? ... In more detail: Image I'd like to have the exact same functionality which gets provided by Spring's ApplicationEvent. Does it make sense to use Reactor for those tasks? Or is it even counterproductive?Assassinate
Sending ApplicationEvents is really easy to do from service classes since you just inject the ApplicationContext and away you go. That said, it's equally easy to inject a shared Reactor and call notify(). I supposed it really comes down to your event handlers. What do you want them to respond to and how do want them responding.Antetype
Also: I've thought about creating a simple wrapper component that sends ApplicationEvents into a configured Reactor so, using that component, you'd be able to grab events from either source and handle them in a single location (the Reactor Consumer<Event<?>>).Antetype
So basically there's only a difference in class and method names when it comes to such a simple case. Are there only performance drawbacks using Reactor for this? Or maybe even benefits?Assassinate
H
9

Lets ignore the Spring's ApplicationEvent as it really is not designed for what your asking (its more about bean lifecycle management).

What you need to figure out is if you want do it

  1. the object oriented way (ie actors, dynamic consumers, registered on the fly) OR
  2. the service way (static consumers, registered on startup).

Using your example of X and Y are they:

  1. ephemeral instances (1) or are they
  2. long lived singletons/service objects (2)?

If you need to register consumers on the fly than Akka is a good choice (I'm not sure about reactor as I have never used it). If you don't want to do your consuming in ephemeral objects than you can use JMS or AMQP.

You also need to understand that these kind of libraries are trying to solve two problems:

  1. Concurrency (ie doing things in parallel on the same machine)
  2. Distribution (ie doing things in parallel on multiple machines)

Reactor and Akka are mainly focused on #1. Akka just recently added cluster support and the actor abstraction makes it easier to do #2. Message Queues (JMS, AMQP) are focused on #2.

For my own work I do the service route and use a heavily modified Guava EventBus and RabbitMQ. I use annotations similar to the Guava Eventbus but also have annotations for the objects sent on the bus however you can just use Guava's EventBus in Async mode as a POC and then make your own like I did.

You might think that you need to have dynamic consumers (1) but most problems can be solved with a simple pub/sub. Also managing dynamic consumers can be tricky (hence Akka is a good choice because the actor model has all sort of management for this)

Hesperidium answered 19/12, 2013 at 2:57 Comment(7)
Thank you :) The application will have lot's of static consumers, which get registered on startup via some annotation (I mean: events get fired and consumed in default spring @Service classes). But for some tasks we could imagine dynamic consumers... maybe this can be circumvented by using a static consumer, which then looks up the dynamic ones or something like this?! --- I already worked with Guava's EventBus and also with GWT's EventBus which is kinda the same, but these don't provide the extended functionality. So I think Reactor or Akka will fit best?!Assassinate
Well reactor is more about async processing and composing async processing with in the same JVM. Akka is similar but higher level. Personally based on your "email problem" I would just use SEDA (ie Message Queues). That is reactor and akka solve the concurrency problem but not the distributed problem (akka does now w/ cluster support). Message Queues fix the distributed problem but not the concurrency. There is a good slidedeck that explain the different options hereHesperidium
Thank you. I just looked at the slides, but I don't really get why you suggest SEDA for out Email problem?! We don't have a distributed system. It's just ONE application that handles it all. It receives the order to generate 10000 emails, and afterwards will call an external SMTP server to send each email. (In our current test application, we've written our own queue. Because: If the server fails to send a mail or crashes, it should - after restart - send the remaining mails. Our custom queue, writes all generated mails to DB, and deletes each mail after successful sending).Assassinate
Because months later you maybe sending 30000 or 100000. With reactor and to some extent akka you can't just boot up another machine like you can with a message queue. With a message queue you get redunancy and forced stateless/isolation out of the box (messages are immutable). Also You can still use the same machine to publish and consume.Hesperidium
Furthermore sending emails is not a short CPU intensive operation but a IO blocking one that is long running. Reactor and Akka are geared more for leveraging concurrency in multicore machines for real time request handling (IMHO) not for batch processing which is what your doing.Hesperidium
Yeah. The sending task is IO blocking, but the previous generation of those emails is highly CPU intensive. So, first I need a lot of CPU power and afterwards I need all network bandwidth possible. But the network stuff is a different topic, it doesn't matter if it takes 10 or 120 minutes to send the mails. It's more important that the progress is observable (to see, how much percent are finished) and doesn't kill the server, so it can still deliver websites as fast as before.Assassinate
I am curious what you ended up doing. I still think a persistent message queue is the right answer. I still find it extremely doubtful that you need massive parallel processing to construct the email over reliably sending them (which only a persistent queue will really do).Hesperidium
P
3

Carefully define what you want from the framework. If a framework has more features than you need, it is not always good. More features means more bugs, more code to learn, and less performance.

Some features to concern are:

  • the nature of actors (threads or lightweight objects)
  • ability to work on a machine cluster (Akka)
  • persistent message queues (JMS)
  • specific features like signals (events without information), transitions (objects to combine messages from different ports into complex event, see Petri Nets) etc.

Be careful with synchronous features like await - it blocks the whole thread and is dangerous when actors are executed on a thread pool (thread starvation).

More frameworks to look at:

Fork-Join Pool - in some cases, allows await without thread starvation

Scientific workflow systems

Dataflow framework for Java - signals, transitions

ADD-ON: Two kinds of actors.

Generally, parallel working system can be represented as a graph, where active nodes send messages to each other. In Java, as in most other mainstream languages, active nodes (actors) can be implemented either as threads or tasks (Runnable or Callable) executed by a thread pool. Normally, part of actors are threads and part are tasks. Both approaches has their advantages and disadvantages, so it's vital to chose most appropriate implementation for each actor in the system. Briefly, threads can block (and wait for events) but consume much memory for their stacks. Tasks may not block but use shared stacks (of threads in a pool).

If a task calls a blocking operation, it excludes a pooled thread from service. If many tasks block, they can exclude all threads, causing a deadlock - those tasks which can unblock blocked tasks cannot run. This kind of deadlock is called thread starvation. If, in attempt to prevent thread starvation, configure thread pool as unlimited, we simply convert tasks into threads, loosing advantages of tasks.

To eliminate calls to blocking operations in tasks, the task should be split in two (or more) - first task calls blocking operation and exits, and the rest is formatted as an asynchronous task started when the blocking operation finishes. Of course, the blocking operation has to have an alternative asynchronous interface. So, for example, instead of reading socket synchronously, NIO or NIO2 libraries should be used.

Unfortunately, standard java library lacks asynchronous counterparts for popular synchronization facilities like queues and semaphores. Fortunately, the are easy to implement from scratch (see Dataflow framework for Java for examples).

So, making computations purely with non-blocking tasks is possible but increases the size of code. Evident advise is to use threads where possible and tasks only for simple massive computations.

Pitchstone answered 18/12, 2013 at 18:18 Comment(7)
Thanks for your answer. I know about the (dis)advantages of await, but sometimes it's necessary. So this is a must have feature. We work on a single machine, so no need for Akka's remotes. Persistent Message Queues will only be needed in special cases (which are independent from the event/message stuff). Conclusion: We don't need special features of Akka and JMS. What I don't really get in your list is the nature of actors. How do the frameworks handle those? (And how is it possible to process an async event without having a separate thread?!)Assassinate
Maybe you (or someone else) could also comment on the difference between Spring's ApplicationEvent and Reactor. From what I understand they basically do the same thing, but then why did the Spring Foundation create another project (Reactor) ? EDIT: I also appended some use cases above.Assassinate
Your addon is a really nice explanation. Luckily I attended in 'parallel computing' course at my university. Otherwise it would've been really hard to follow your text. ;) As far as I know Java's standard libs contain some rudimentary semaphores. But as far as I remember my prof at university told, that those Java Semaphores behave not like real Semaphores and provide functionality which kinda destroys the concept of Semaphores...Assassinate
Java Semaphores are good classical Semaphores - never heard they destroy anything.Pitchstone
With Java Semaphores you can actually reset the counter and set values as you want to. That shouldn't be possible with a real semaphore, only: (try to)lock and release.Assassinate
@Benjamin it depends on what the semaphore represents. Usually it stands for a resource counter. Some resource management policies allow only add and remove resources (represented as lock and release), others allow to set arbitrary values.Pitchstone
Yeah, maybe I mixed some things up. It's over 2 years ago, since I had that course at university. I just remember it was something with java and concurrency. Maybe it was Java's under the hood implementation of Semaphore or even something else like synchronized... I can't remember :-DAssassinate

© 2022 - 2024 — McMap. All rights reserved.