Blue/Green Deployments with Azure ServiceFabric
Asked Answered
Q

1

14

I'm currently building an application using the ReliableActors framework on Azure ServiceFabric. As we scale up, I'm looking at doing blue/green deployments. I can see how to do this using a stateless system. Is there's a way to do this using statefull actors?

Quinacrine answered 8/3, 2016 at 16:35 Comment(0)
F
36

Service Fabric is all about rolling upgrades, rather than deployment swaps, like a VIP swap. Both stateless and stateful services are upgraded the same way, but there are a few additional nuances to stateful that I'll mention later.

By rolling upgrades, I mean upgrades to an application are done in place, one upgrade domain at a time, so that there is no downtime and no sudden switch. A rolling upgrade in Service Fabric can be done in a safe "managed" mode where the platform will perform health checks before moving on to the next upgrade domain, and will automatically roll back if health checks fail.

OK, that all sounds nice. But how do you do blue/green deployments when upgrades are always rolling upgrades?

This where application types and version come in. Instead of having two "environments" that can hold two running applications, Service Fabric has this concept of versioned application types from which application instances can be created. Here's an example of how this works:

Let's say I want to make an application called Foo. My Foo application is defined as an application type, call it FooType. This is similar to defining a class in C#. And like class in C#, I can create instances of my type. Each instance has a unique name, similar to how each object instance of a class has a unique variable name. But unlike classes in C#, my FooType has a version number. Then I can "register" the application type and version in my cluster:

FooType 1.0

With that registered, I can create an instance of that application:

"fabric:/FooApp" of FooType 1.0

Now, let's say I develop version 2.0 of my application. So I register version 2.0 of my FooType in the cluster:

FooType 1.0
FooType 2.0

Now I have both versions of FooType registered, and I still have an instance of 1.0 running:

"fabric:/FooApp" of FooType 1.0

Here's where it gets fun. I can do some interesting things:

I can take "fabric:/FooApp" - an instance of FooType 1.0 - and upgrade it to FooType 2.0. This will be a rolling upgrade of that running application.

Or.. I can leave "fabric:/FooApp" alone, and create a new instance of my version 2.0 application:

"fabric:/FooApp" of FooType 1.0
"fabric:/FooAppv2Test" of FooType 2.0

Now I have two applications, running side-by-side, in the same cluster. One is an instance of 1.0, and the other is an instance of 2.0. With some configuring of ports and application endpoints, I can ensure users are still going to the 1.0 instance while I test out the 2.0 instance.

Great, so all my tests pass against the 2.0 instance, so now I can safely take the 1.0 instance and upgrade it to 2.0 of FooType. Again, this is a rolling upgrade of that instance (fabric:/FooApp), it's not migrating users to the new instance (fabric:/FooAppv2Test). Later I'll go and delete fabric:/FooAppv2Test because that was just for testing.

One of the benefits of blue/green though is being able to swap back to the other deployment if the new one fails. Well, you still have both 1.0 and 2.0 of FooType registered. So if your application started misbehaving after the upgrade from 1.0 to 2.0, you can just "upgrade" it back to 1.0! In fact, you can "upgrade" an application instance between as many different versions of its application type as you want! And you don't need to have instances of all your application versions running like you do in a swapping environment, you just have the different versions registered and a single application instance that can "upgrade" between versions.

I mentioned caveats with stateful services. The big thing to remember with stateful services is that the application state - your users' data - is contained in the application instance (fabric:/FooApp), so for your users to see their data you need to keep them on that instance. That's why we do rolling upgrades instead of deployment swaps.

This is just the basic idea. There are other ways you can play around with application types, versions, and instances depending on what your goals are and how your application works, but that's for another time.

Ferguson answered 9/3, 2016 at 1:20 Comment(7)
That was helpful, thank you Vaclav. One of the benefits, in my mind, of blue/green style deployments is I can do a trickle through to the second service to make sure its working before I do the whole swap, usually by directing it through the load balancer. Is there a way to do a similar validation on SF?Quinacrine
For stateless service, sure, you can set this up creating an instance of v1.0 and v2.0, and through some routing of your own you can funnel traffic to v2.0 while gradually scaling it up and scaling v1.0 down. For stateful services it's a little trickier because the state itself is inside an application instance, so if you send users to a new instance of the application, their data won't be there. This is simply the nature of stateful things in general, and that's why Service Fabric has very robust, first-class rolling upgrades.Ferguson
I would also argue that a rolling upgrade is in a sense "trickling through" because your entire application isn't upgraded all at once. It's upgrade one slice at a time, and if at any point a health check fails (you can do custom health checks), it will roll back. So you can think of it as an automated trickle-through deployment, and we do love us some automation!Ferguson
Hi, This was a helpful explanation, however I would like to pause on "upgrades to an application are done in place, one upgrade domain at a time, so that there is no downtime". Is it really no downtime, as in 0 seconds of downtime? because from my test, when an upgrade domain is being upgraded, the services are not responding until the upgrade domain is finished.Cucullate
I suppose a more accurate way to state it would be: to enable no-downtime upgrades. You can certainly cause downtime yourself in any number of ways, e.g., a cluster with only one upgrade domain, a stateful service with only one replica, service code that doesn't resolve service addresses correctly, etc. But other than that, yes, your application shouldn't experience any downtime during an upgrade.Ferguson
What about two versions of web apps running on the same cluster? I am primarily interested in port conflicts since the ports would be the same for both versions.Ropable
Do you happen to know how to answer this one as well? #78063041Leukas

© 2022 - 2024 — McMap. All rights reserved.