Best practices for deploying Java webapps with minimal downtime?

D

18

58

When deploying a large Java webapp (>100 MB .war) I'm currently use the following deployment process:

The application .war file is expanded locally on the development machine.
The expanded application is rsync:ed from the development machine to the live environment.
The app server in the live environment is restarted after the rsync. This step is not strictly needed, but I've found that restarting the application server on deployment avoids "java.lang.OutOfMemoryError: PermGen space" due to frequent class loading.

Good things about this approach:

The rsync minimizes the amount of data sent from the development machine to the live environment. Uploading the entire .war file takes over ten minutes, whereas an rsync takes a couple of seconds.

Bad things about this approach:

While the rsync is running the application context is restarted since the files are updated. Ideally the restart should happen after the rsync is complete, not when it is still running.
The app server restart causes roughly two minutes of downtime.

I'd like to find a deployment process with the following properties:

Minimal downtime during deployment process.
Minimal time spent uploading the data.
If the deployment process is app server specific, then the app server must be open-source.

Question:

Given the stated requirements, what is the optimal deployment process?

Dracula answered 28/10, 2009 at 21:46 Comment(5)

In my opinion this should be a "community wiki" – Ashworth 28/10, 2009 at 21:49

Nathan: Why? It is a technical problem that I need the answer to. Maybe I'm missing some of the rules surrounding "community wiki". – Dracula 1/11, 2009 at 16:8

Just to satisfy my curiosity: what is so heavy in your webapp? – Tensible 1/11, 2009 at 20:25

Pascal Thivent: Grails + static files (graphics) + some external dependencies quickly adds up to >100 MB. – Dracula 1/11, 2009 at 23:12

knorv, have you tried adjusting your memory/permgem space on the server jvm? – Lynden 6/11, 2009 at 22:37

G

20

It has been noted that rsync does not work well when pushing changes to a WAR file. The reason for this is that WAR files are essentially ZIP files, and by default are created with compressed member files. Small changes to the member files (before compression) result in large scale differences in the ZIP file, rendering rsync's delta-transfer algorithm ineffective.

One possible solution is to use jar -0 ... to create the original WAR file. The -0 option tells the jar command to not compress the member files when creating the WAR file. Then, when rsync compares the old and new versions of the WAR file, the delta-transfer algorithm should be able to create small diffs. Then arrange that rsync sends the diffs (or original files) in compressed form; e.g. use rsync -z ... or a compressed data stream / transport underneath.

EDIT: Depending on how the WAR file is structured, it may also be necessary to use jar -0 ... to create component JAR files. This would apply to JAR files that are frequently subject to change (or that are simply rebuilt), rather than to stable 3rd party JAR files.

In theory, this procedure should give a significant improvement over sending regular WAR files. In practice I have not tried this, so I cannot promise that it will work.

The downside is that the deployed WAR file will be significantly bigger. This may result in longer webapp startup times, though I suspect that the effect would be marginal.

A different approach entirely would be to look at your WAR file to see if you can identify library JARs that are likely to (almost) never change. Take these JARs out of the WAR file, and deploy them separately into the Tomcat server's common/lib directory; e.g. using rsync.

Giamo answered 29/10, 2009 at 5:43 Comment(4)

One HUGE problem with moving libraries into a shared directory is if they hold references to objects within the web-app. If that's the case, then they will prevent the JVM from reclaiming the space used by the web-app, leading to permgen exhaustion. – Aggrieve 6/11, 2009 at 13:58

But if the shared library does not have statics that hold references to webapp objects, the second approach is OK, right? – Giamo 7/11, 2009 at 8:45

Of course. But how do you know? For example, the JDK's Introspector class caches class definitions, which means that if you use it from a web-app, you have to explicitly flush the cache on redeploy. But what if your shared marshalling library uses Introspector under the covers? – Aggrieve 7/11, 2009 at 13:32

"But how do you know?". By manually or automatically inspecting the code. (It would be feasible to write a utility that checked the classes in a JAR file for potentially troublesome statics.) – Giamo 7/11, 2009 at 23:28

M

31

Update:

Since this answer was first written, a better way to deploy war files to tomcat with zero downtime has emerged. In recent versions of tomcat you can include version numbers in your war filenames. So for example, you can deploy the files ROOT##001.war and ROOT##002.war to the same context simultaneously. Everything after the ## is interpreted as a version number by tomcat and not part of the context path. Tomcat will keep all versions of your app running and serve new requests and sessions to the newest version that is fully up while gracefully completing old requests and sessions on the version they started with. Specifying version numbers can also be done via the tomcat manager and even the catalina ant tasks. More info here.

Original Answer:

Rsync tends to be ineffective on compressed files since it's delta-transfer algorithm looks for changes in files and a small change an uncompressed file, can drastically alter the resultant compressed version. For this reason, it might make good sense to rsync an uncompressed war file rather than a compressed version, if network bandwith proves to be a bottleneck.

What's wrong with using the Tomcat manager application to do your deployments? If you don't want to upload the entire war file directly to the Tomcat manager app from a remote location, you could rsync it (uncompressed for reasons mentioned above) to a placeholder location on the production box, repackage it to a war, and then hand it to the manager locally. There exists a nice ant task that ships with Tomcat allowing you to script deployments using the Tomcat manager app.

There is an additional flaw in your approach that you haven't mentioned: While your application is partially deployed (during an rsync operation), your application could be in an inconsistent state where changed interfaces may be out of sync, new/updated dependencies may be unavailable, etc. Also, depending on how long your rsync job takes, your application may actually restart multiple times. Are you aware that you can and should turn off the listening-for-changed-files-and-restarting behavior in Tomcat? It is actually not recommended for production systems. You can always do a manual or ant scripted restart of your application using the Tomcat manager app.

Your application will be unavailable to users during a restart, of course. But if you're so concerned about availability, you surely have redundant web servers behind a load balancer. When deploying an updated war file, you could temporarily have the load balancer send all requests to other web servers until the deployment is over. Rinse and repeat for your other web servers.

Mchale answered 28/10, 2009 at 21:56 Comment(15)

It is my understanding that rsync:ing a zip representation of two similar directories won't give me the same speed benefits as rsync:ing the two directories. Please correct me if I'm mistaken. – Dracula 28/10, 2009 at 22:6

The thing is: a tiny local change in an uncompressed file can lead to very large differences in the compressed file, i.e. rsync will have to transfer more data - if the network bandwidth is the bottleneck, and there are usually small differences in many files, this could lead to an overall slower result. – Raucous 28/10, 2009 at 22:28

@knorv: You might actually be right about that. Although rsync uses a delta-transfer algorithm (samba.anu.edu.au/ftp/rsync/rsync.html), compression tends to alter the entire structure of the file which makes rsync's delta-transfer algorithm somewhat ineffective (zsync.moria.org.uk/paper200501/ch01s03.html). If do choose to uncompress files before rsyncing, at least use the -z option which tells rsync to compress data before transferring. – Mchale 28/10, 2009 at 22:28

@Michael Borgwardt: I just researched it further and came to that conclusion too. See my comment to @knorv. – Mchale 28/10, 2009 at 22:29

Asaph: Sorry, but I still think you're wrong. As I've understood it the delta-transfer algorithm is known to have problems with compressed data. See for example this discussion on this exact topic lists.samba.org/archive/rsync/2003-August/007010.html – Dracula 28/10, 2009 at 22:30

My last comment was a response to "It seems to me that rsyncing a zip representation of 2 directories would be more efficient than the directories uncompressed". Had not seen the other comments. – Dracula 28/10, 2009 at 22:31

@knorv: Are you using the -z option with rsync? I think that will help you. – Mchale 28/10, 2009 at 22:35

@knorv: , @Michael Borgwardt: I've updated my answer to reflect the new things I just learned about rsync and compressed files :) – Mchale 28/10, 2009 at 22:40

Asaph: Please elaborate. I fail to see how how -z would help me if I'm following your advice on rsync:ing the entire .war file. – Dracula 28/10, 2009 at 22:42

@knorv: I changed my answer. You and Michael Borgwardt convinced me that rsyncing the uncompressed war might be a good idea (if network bandwidth is a bottleneck in the deployment). So given that, you should use rsync -z on the uncompressed files so that rsync will send fewer bytes across the network. – Mchale 28/10, 2009 at 22:46

Asaph: I'm relying on the built-in compression in ssh (rsync over ssh), so the -z option is not needed. – Dracula 28/10, 2009 at 22:50

@knorv: I think that the way to minimize WAR file (re-)deployment times would be to use jar -0 ... to generate the WAR file. This tells it to use no compressions in the ZIP file, and that will allow rsync to produce smaller deltas. – Giamo 28/10, 2009 at 22:54

Stephen C: That's smart! Didn't think about that option. Consider posting that in a separate answer. – Dracula 28/10, 2009 at 22:57

+1 for solving the downtime by using the network. Yes, it means getting the new version to production will take longer, but it is the only real way to go if minimizing downtime is important. You can even start up the new version as a separate tomcat process on a different port on the same host - then flip the network traffic to go to that port instead, and shut down the old version once its connections are gone. Of course, that doesn't help you in case the process crashes or the box dies. – Foilsman 31/10, 2009 at 19:50

@Mchale Thanks for updated info. What is the best way to version maven build in the same way? – Mistrial 30/7, 2015 at 8:23

G

20

It has been noted that rsync does not work well when pushing changes to a WAR file. The reason for this is that WAR files are essentially ZIP files, and by default are created with compressed member files. Small changes to the member files (before compression) result in large scale differences in the ZIP file, rendering rsync's delta-transfer algorithm ineffective.

One possible solution is to use jar -0 ... to create the original WAR file. The -0 option tells the jar command to not compress the member files when creating the WAR file. Then, when rsync compares the old and new versions of the WAR file, the delta-transfer algorithm should be able to create small diffs. Then arrange that rsync sends the diffs (or original files) in compressed form; e.g. use rsync -z ... or a compressed data stream / transport underneath.

EDIT: Depending on how the WAR file is structured, it may also be necessary to use jar -0 ... to create component JAR files. This would apply to JAR files that are frequently subject to change (or that are simply rebuilt), rather than to stable 3rd party JAR files.

In theory, this procedure should give a significant improvement over sending regular WAR files. In practice I have not tried this, so I cannot promise that it will work.

The downside is that the deployed WAR file will be significantly bigger. This may result in longer webapp startup times, though I suspect that the effect would be marginal.

A different approach entirely would be to look at your WAR file to see if you can identify library JARs that are likely to (almost) never change. Take these JARs out of the WAR file, and deploy them separately into the Tomcat server's common/lib directory; e.g. using rsync.

Giamo answered 29/10, 2009 at 5:43 Comment(4)

One HUGE problem with moving libraries into a shared directory is if they hold references to objects within the web-app. If that's the case, then they will prevent the JVM from reclaiming the space used by the web-app, leading to permgen exhaustion. – Aggrieve 6/11, 2009 at 13:58

But if the shared library does not have statics that hold references to webapp objects, the second approach is OK, right? – Giamo 7/11, 2009 at 8:45

Of course. But how do you know? For example, the JDK's Introspector class caches class definitions, which means that if you use it from a web-app, you have to explicitly flush the cache on redeploy. But what if your shared marshalling library uses Introspector under the covers? – Aggrieve 7/11, 2009 at 13:32

"But how do you know?". By manually or automatically inspecting the code. (It would be feasible to write a utility that checked the classes in a JAR file for potentially troublesome statics.) – Giamo 7/11, 2009 at 23:28

C

13

In any environment where downtime is a consideration, you are surely running some sort of cluster of servers to increase reliability via redundancy. I'd take a host out of the cluster, update it, and then throw it back into the cluster. If you have an update that cannot run in a mixed environment (incompatible schema change required on the db, for example), you are going to have to take the whole site down, at least for a moment. The trick is to bring up replacement processes before dropping the originals.

Using tomcat as an example - you can use CATALINA_BASE to define a directory where all of tomcat's working directories will be found, separate from the executable code. Every time I deploy software, I deploy to a new base directory so that I can have new code resident on disk next to old code. I can then start up another instance of tomcat which points to the new base directory, get everything started up and running, then swap the old process (port number) with the new one in the load balancer.

If I am concerned about preserving session data across the switch, I can set up my system such that every host has a partner to which it replicates session data. I can drop one of those hosts, update it, bring it back up so that it picks the session data back up, and then switch the two hosts. If I've got multiple pairs in the cluster, I can drop half of all pairs, then do a mass switch, or I can do them a pair at a time, depending upon the requirements of the release, requirements of the enterprise, etc. Personally, however, I prefer to just allow end-users to suffer the very occasional loss of an active session rather than deal with trying to upgrade with sessions intact.

It's all a tradeoff between IT infrastructure, release process complexity, and developer effort. If your cluster is big enough and your desire strong enough, it is easy enough to design a system that can be swapped out with no downtime at all for most updates. Large schema changes often force actual downtime, since updated software usually cannot accommodate the old schema, and you probably cannot get away with copying the data to a new db instance, doing the schema update, and then switching the servers to the new db, since you will have missed any data written to the old after the new db was cloned from it. Of course, if you have resources, you can task developers with modifying the new app to use new table names for all tables that are updated, and you can put triggers in place on the live db which will correctly update the new tables with data as it is written to the old tables by the prior version (or maybe use views to emulate one schema from the other). Bring up your new app servers and swap them into the cluster. There are a ton of games you can play in order to minimize downtime if you have the development resources to build them.

Perhaps the most useful mechanism for reducing downtime during software upgrades is to make sure that your app can function in a read-only mode. That will deliver some necessary functionality to your users but leave you with the ability to make system-wide changes that require database modifications and such. Place your app into read-only mode, then clone the data, update schema, bring up new app servers against new db, then switch the load balancer to use the new app servers. Your only downtime is the time required to switch into read-only mode and the time required to modify the config of your load balancer (most of which can handle it without any downtime whatsoever).

Carver answered 5/11, 2009 at 19:30 Comment(1)

To add some updating information to this answer ... Tomcat can persist sessions in a database. Also, using the load balancing technique to hot-swap to the new version is sometimes called Blue Green Deployment. – Bandanna 11/12, 2017 at 17:36

R

10

My advice is to use rsync with exploded versions but deploy a war file.

Create temporary folder in the live environment where you'll have exploded version of webapp.
Rsync exploded versions.
After successfull rsync create a war file in temporary folder in the live environment machine.
Replace old war in the server deploy directory with new one from temporary folder.

Replacing old war with new one is recommended in JBoss container (which is based on Tomcat) beacause it'a atomic and fast operation and it's sure that when deployer will start entire application will be in deployed state.

Radioscopy answered 1/11, 2009 at 20:13 Comment(2)

This should avoid what would be my biggest concern with the OP's practice, which is a non-atomic update. – Aggrieve 2/11, 2009 at 14:44

Yeah, exploded versions and hot deployment is good for development mode, but in production it's better to use wars. – Radioscopy 3/11, 2009 at 9:39

S

8

Can't you make a local copy of the current web application on the web server, rsync to that directory and then perhaps even using symbolic links, in one "go", point Tomcat to a new deployment without much downtime?

Sells answered 28/10, 2009 at 23:1 Comment(0)

M

4

Hot Deploy a Java EAR to Minimize or Eliminate Downtime of an Application on a Server or How to “hot” deploy war dependency in Jboss using Jboss Tools Eclipse plugin might have some options for you.

Deploying to a cluster with no downtime is interesting too.

JavaRebel has hot-code deployement too.

Mensural answered 2/11, 2009 at 14:38 Comment(2)

JavaRebel is now called JRebel – Hazelhazelnut 6/11, 2009 at 15:17

For production grade updates with JRebel technologies is a tool called LiveRebel. – Stith 10/4, 2011 at 11:45

L

4

Your approach to rsync the extracted war is pretty good, also the restart since I believe that a production server should not have hot-deployment enabled. So, the only downside is the downtime when you need to restart the server, right?

I assume all state of your application is hold in the database, so you have no problem with some users working on one app server instance while other users are on another app server instance. If so,

Run two app servers: Start up the second app server (which listens on other TCP ports) and deploy your application there. After deployment, update the Apache httpd's configuration (mod_jk or mod_proxy) to point to the second app server. Gracefully restarting the Apache httpd process. This way you will have no downtime and new users and requests are automatically redirected to the new app server.

If you can make use of the app server's clustering and session replication support, it will be even smooth for users which are currently logged in, as the second app server will resync as soon as it starts. Then, when there are no accesses to the first server, shut it down.

Leastways answered 4/11, 2009 at 13:46 Comment(0)

A

4

This is dependant on your application architecture.

One of my applications sits behind a load-balancing proxy, where I perform a staggered deployment - effectively eradicating downtime.

Acnode answered 6/11, 2009 at 13:41 Comment(1)

+1. This is the solution we use. With a little bit of intelligence, you can ensure that the cluster of servers running a mix of version N and version N-1 will function correctly. Then just take one of your servers offline, upgrade it, and bring it back online. Run for a while to ensure there's no problem then do the same for each of half the other servers. Run like that for a couple of days so you have a backout position, then convert the rest. – Discontinuance 7/11, 2009 at 5:18

T

2

If static files are a big part of your big WAR (100Mo is pretty big), then putting them outside the WAR and deploying them on a web server (e.g. Apache) in front of your application server might speed up things. On top of that, Apache usually does a better job at serving static files than a servlet engine does (even if most of them made significant progress in that area).

So, instead of producing a big fat WAR, put it on diet and produce:

a big fat ZIP with static files for Apache
a less fat WAR for the servlet engine.

Optionally, go further in the process of making the WAR thinner: if possible, deploy Grails and other JARs that don't change frequently (which is likely the case of most of them) at the application server level.

If you succeed in producing a lighter WAR, I wouldn't bother of rsyncing directories rather than archives.

Strengths of this approach:

The static files can be hot "deployed" on Apache (e.g. use a symbolic link pointing on the current directory, unzip the new files, update the symlink and voilà).
The WAR will be thinner and it will take less time to deploy it.

Weakness of this approach:

There is one more server (the web server) so this add (a bit) more complexity.
You'll need to change the build scripts (not a big deal IMO).
You'll need to change the rsync logic.

Tensible answered 4/11, 2009 at 21:45 Comment(0)

M

1

I'm not sure if this answers your question, but I'll just share on the deployment process I use or encounter in the few projects I did.

Similiar to you, I do not ever recall making a full war redeployment or update. Most of the time, my updates are restricted to a few jsp files, maybe a library, some class files. I am able to manage and determine which are the affected artifacts, and usually, we packaged those update in a zip file, along with an update script. I will run the update script. The script does the following:

Backup the files that will be overwritten, maybe to a folder with today's date and time.
Unpackage my files
Stop the application server
Move the files over
Start the application server

If downtime is a concern, and they usually are, my projects are usually HA, even if they are not sharing state but using a router that provide sticky session routing.

Another thing that I am curious would be, why the need to rsync? You should able to know what are the required changes, by determining them on your staging/development environment, not performing delta checks with live. In most cases, you would have to tune your rsync to ignore files anyway, like certain property files that define resources a production server use, like database connection, smtp server, etc.

I hope this is helpful.

Molasses answered 31/10, 2009 at 15:11 Comment(0)

T

1

At what is your PermSpace set? I would expect to see this grow as well but should go down after collection of the old classes? (or does the ClassLoader still sit around?)

Thinking outloud, you could rsync to a separate version- or date-named directory. If the container supports symbolic links, could you SIGSTOP the root process, switch over the context's filesystem root via symbolic link, and then SIGCONT?

Thief answered 5/11, 2009 at 19:36 Comment(0)

S

1

As for the early context restarts. All containers have configuration options to disable auto-redeploy on class file or static resource changes. You probably can't disable auto redeploys on web.xml changes so this file is the last one to update. So if you disable to auto redeploy and update the web.xml as the last one you'll see the context restart after the whole update.

Stith answered 6/11, 2009 at 8:56 Comment(0)

S

1

We upload the new version of the webapp to a separate directory, then either move to swap it out with the running one, or use symlinks. For example, we have a symlink in the tomcat webapps directory named "myapp", which points to the current webapp named "myapp-1.23". We upload the new webapp to "myapp-1.24". When all is ready, stop the server, remove the symlink and make a new one pointing to the new version, then start the server again.

We disable auto-reload on production servers for performance, but even so, having files within the webapp changing in a non-atomic manner can cause issues, as static files or even JSP pages could change in ways that cause broken links or worse.

In practice, the webapps are actually located on a shared storage device, so clustered, load-balanced, and failover servers all have the same code available.

The main drawback for your situation is that the upload will take longer, since your method allows rsync to only transfer modified or added files. You could copy the old webapp folder to the new one first, and rsync to that, if it makes a significant difference, and if it's really an issue.

Shorten answered 30/4, 2010 at 12:16 Comment(0)

T

1

Tomcat 7 has a nice feature called "parallel deployment" that is designed for this use case.

The gist is that you expand the .war into a directory, either directly under webapps/ or symlinked. Successive versions of the application are in directories named app##version, for example myapp##001 and myapp##002. Tomcat will handle existing sessions going to the old version, and new sessions going to the new version.

The catch is that you have to be very careful with PermGen leaks. This is especially true with Grails that uses a lot of PermGen. VisualVM is your friend.

Trimester answered 2/11, 2013 at 16:58 Comment(0)

B

1

Just use 2 or more tomcat servers with a proxy over it. That proxy can be of apache/nignix/haproxy.

Now in each of the proxy server there is "in" and "out" url with ports are configured.

First copy your war in the tomcat without stoping the service. Once war is deployed it is automatically opened by the tomcat engine.

Note cross check unpackWARs="true" and autoDeploy="true" in node "Host" inside server.xml

It look likes this

  <Host name="localhost"  appBase="webapps"
        unpackWARs="true" autoDeploy="true"
        xmlValidation="false" xmlNamespaceAware="false">

Now see the logs of tomcat. If no error is there it means it is up successfully.

Now hit all APIs for testing

Now come to your proxy server .

Simply change the background url mapping with the new war's name. Since registering with the proxy servers like apache/nignix/haProxy took very less time, you will feel minimum downtime

Refer -- https://developers.google.com/speed/pagespeed/module/domains for mapping urls

Baryon answered 19/4, 2014 at 16:24 Comment(0)

N

1

You're using Resin, Resin has built in support for web app versioning.

http://www.caucho.com/resin-4.0/admin/deploy.xtp#VersioningandGracefulUpgrades

Update: It's watchdog process can help with permgenspace issues too.

Neuroglia answered 1/8, 2014 at 0:35 Comment(0)

J

0

Not a "best practice" but something I just thought of.

How about deploying the webapp through a DVCS such as git?

This way you can let git figure out which files to transfer to the server. You also have a nice way to back out of it if it turns out to be busted, just do a revert!

Judijudicable answered 21/5, 2010 at 14:42 Comment(0)

C

0

I wrote a bash script that takes a few parameters and rsyncs the file between servers. Speeds up rsync transfer a lot for larger archives:

https://gist.github.com/3985742

Chronological answered 31/10, 2012 at 8:11 Comment(0)

Update:

Original Answer:

Recommended topics

Hot tags