How to build a distributed java application?

Asked 4/7, 2012 at 21:2 Answered 5/7, 2012 at 8:51

java multithreading web-services client-server distributed-computing

First of all, I have a conceptual question, Does the word "distributed" only mean that the application is run on multiple machines? or there are other ways where an application can be considered distributed (for example if there are many independent modules interacting togehter but on the same machine, is this distributed?).

Second, I want to build a system which executes four types of tasks, there will be multiple customers and each one will have many tasks of each type to be run periodically. For example: customer1 will have task_type1 today , task_type2 after two days and so on, there might be customer2 who has task_type1 to be executed at the same time like customer1's task_type1. i.e. there is a need for concurrency. Configuration for executing the tasks will be stored in DB and the outcomes of these tasks are going to be stored in DB as well. the customers will use the system from a web browser (html pages) to interact with system (basically, configure tasks and see the outcomes). I thought about using a rest webservice (using JAX-RS) where the html pages would communicate with and on the backend use threads for concurrent execution. Questions:

This sounds simple, But am I going in the right direction? or i should be using other technologies or concepts like Java Beans for example?

2.If my approach is fine, do i need to use a scripting language like JSP or i can submit html forms directly to the rest urls and get the result (using JSON for example)?

If I want to make the application distributed, is it possible with my idea? If not what would i need to use?

Sorry for having many questions , but I am really confused about this.

Jambalaya answered 4/7, 2012 at 21:2 Comment(14)

do you really expect that task_type1, task_type2, etc. are going to be very CPU-intensive? Have you tested to confirm that? Are there really going to be a lot of users utilizing the application at the same time? It may be that a single server can handle the load just fine and that distributing the application would just complicate things with no added benefit. 2nd, don't assume you need threads because "tasks need to be executed at the same time". Unless you need to take advantage of multiple CPUs, a single thread pulling tasks off a work queue will probably work just fine. – Designedly 4/7, 2012 at 21:41

@AlexD the tasks themselves are not CPU intensive, but if i get many customers with many tasks this might be a problem in the future, I am considering scalability for future. I assume one server would handle for now. But I wanted to know how to make it distributed in case i would need. i.e. I wanted to understand the concepts of distribution. – Jambalaya 4/7, 2012 at 21:56

@AlexD Regarding the threads, I need tasks to be executed on specific times, for example, there might be task_type1 to be executed at 10:00 am for customer1 and another task_type1 for customer2 also at 10:00 am. I need some sort of concurrency in execution. i.e: processing the two tasks in parallel. – Jambalaya 4/7, 2012 at 21:57

the term "distributed" is generally used in the sense of "running on multiple machines" – Mcconaghy 4/7, 2012 at 22:34

i think the important question is "why do you need a distributed application?" – Mcconaghy 4/7, 2012 at 22:36

"should be using other technologies or concepts like Java Beans for example" – if you're not certain what "Java Beans" are for, you probably shouldn't be trying to write a distributed system. This sounds like you're just mashing technology buzzwords together randomly. – Idona 4/7, 2012 at 22:52

@Idona I want to know what is it and if its the solution.. that's why I am asking the question. – Jambalaya 4/7, 2012 at 23:16

@MikhailKozhevnikov I would need a distributed application if too many users use the system (heavy load), so that the load will be distributed on several machines. – Jambalaya 4/7, 2012 at 23:18

@Sam No single technology, least of all one as low-level as the Java Beans spec, is "the solution". Especially to the problem of building distributed systems, which has many, many, many "solutions". Which is why it seems like you're just throwing buzzwords around instead of actually learning about the problem domain. – Idona 4/7, 2012 at 23:19

@Sam For the sake of completeness, Java Beans are components that are coded to follow a set of conventions that allows introspection of their characteristics (notably which properties they have) at runtime, and interacting with them dynamically. They spec has absolutely no intrinsic connection to building distributed systems. – Idona 4/7, 2012 at 23:23

@Sam, you have a misconception about threads, and it's a common one: that threads are needed to make a computer "do two things at the same time". The reality is that if you have only one CPU, the computer only can do one thing at a time. Threads just make it switch back and forth quickly between multiple tasks, but won't make the tasks actually finish more quickly (unless you have multiple CPUs). – Designedly 5/7, 2012 at 8:26

... If the tasks are fast (say each one takes 0.0005 seconds), it's better to just process them sequentially. If 2 tasks are both supposed to happen at "10:00am", one will finish at 10:00:00.0005, and the next will finish at 10:00:00.001. I hope you're not hoping to achieve higher precision than that, because delays from network latency are already far, far greater. – Designedly 5/7, 2012 at 8:27

@AlexD I know this is an old thread, when you say 'multiple CPUs', do you mean 'Cores'? – Chubby 21/3, 2019 at 11:1

@SyAu, yes, you could say that. – Designedly 21/3, 2019 at 12:6

I just want to add one point to the already posted answers. Please take my remarks with a grain of salt, since all the web applications I have ever built have run on one server only (aside from applications deployed to Heroku, which may "distribute" your application for you).

If you feel that you may need to distribute your application for scalability, the first thing you should think about is not web services and multithreading and message queues and Enterprise JavaBeans and...

The first thing to think about is your application domain itself and what the application will be doing. Where will the CPU-intensive parts be? What dependencies are there between those parts? Do the parts of the system naturally break down into parallel processes? If not, can you redesign the system to make it so? IMPORTANT: what data needs to be shared between threads/processes (whether they are running on the same or different machines)?

The ideal situation is where each parallel thread/process/server can get its own chunk of data and work on it without any need for sharing. Even better is if certain parts of the system can be made stateless -- stateless code is infinitely parallelizable (easily and naturally). The more frequent and fine-grained data sharing between parallel processes is, the less scalable the application will be. In extreme cases, you may not even get any performance increase from distributing the application. (You can see this with multithreaded code -- if your threads constantly contend for the same lock(s), your program may even be slower with multiple threads+CPUs than with one thread+CPU.)

The conceptual breakdown of the work to be done is more important than what tools or techniques you actually use to distribute the application. If your conceptual breakdown is good, it will be much easier to distribute the application later if you start with just one server.

Designedly answered 5/7, 2012 at 8:51 Comment(0)

The term "distributed application" means that parts of the application system will execute on different computational nodes (which may be different CPU/cores on different machines or among multiple CPU/cores on the same machine).

There are many different technological solutions to the question of how the system could be constructed. Since you were asking about Java technologies, you could, for example, build the web application using Google's Web Toolkit, which will give you a rich browser based client user experience. For the server deployed parts of your system, you could start out using simple servlets running in a servlet container such as Tomcat. Your servlets will be called from the browser using HTTP based remote procedure calls.

Later if you run into scalability problems you can start to migrate parts of the business logic to EJB3 components that themselves can ultimately deployed on many computational nodes within the context of an application server, like Glassfish, for example. I don think you don't need to tackle this problem until you run it to it. It is hard to say whether you will without know more about the nature of the tasks the customer will be performing.

Dusen answered 5/7, 2012 at 3:8 Comment(0)

To answer your first question - you could get the form to submit directly to the rest urls. Obviously it depends exactly on your requirements.

As @AlexD mentioned in the comments above, you don't always need to distribute an application, however if you wish to do so, you should probably consider looking at JMS, which is a messaging API, which can allow you to run almost any number of worker application machines, readying messages from the message queue and processing them.

If you wanted to produce a dynamically distributed application, to run on say, multiple low-resourced VMs (such as Amazon EC2 Micro instances) or physical hardware, that can be added and removed at will to cope with demand, then you might wish to consider integrating it with Project Shoal, which is a Java framework that allows for clustering of application nodes, and having them appear/disappear at any time. Project Shoal uses JXTA and JGroups as the underlying communication protocol.

Another route could be to distribute your application using EJBs running on an application server.

Grannias answered 5/7, 2012 at 3:17 Comment(0)

Recommended topics

Hot tags