Engineering Improvement Runbook | Continuous Deployment
Patrick Leal
July 28th, 2020
Mitigate risk with rolling deployments
Deploying a new feature to production is a momentous occasion. It's important to ensure that everything goes properly at this stage, as deployments tend to be error-prone when not handled correctly.
To examine why this is and how you can avoid it, let's take a look at the different types of deployments available and where some of them fall short.
Common Software Version Deployment Options
A rolling deployment eliminates this problem because it requires only a single additional node to be active at a time. As you become more adept at this process, there are tools you can use to replace multiple nodes at once and accelerate the process.
- Takedown the application and swap it out with the new version.
This path will require downtime for your organization, which is not ideal and can amount to wasted time and resources. - Another route is the blue/green deployment, which will bring up your new instance at the same time as the original application, and simply divert traffic to the new instance when it's ready to deploy.
The problem with this route, however, is that creating an additional instance can cost a significant amount of both time and resources. - Then there is a canary deployment, in which you roll out your features to a subset of users as an initial beta test to ensure that everything is functioning optimally before executing a full release.
While this is a highly recommended route, if coupled with a full blue/green deployment, there is a significant propensity for risk and added cost. - Then there is a rolling deployment, which slowly and continuously replaces currently running instances of your application with newer ones over time.
We believe that rolling deployments are the best option as they reduce the potential for wasted resources and mitigate risk significantly.
Furthermore, rolling deployments are effective for systems composed of multiple nodes that you can deploy independently, whether they be dedicated servers or Docker containers for microservices. In this post, we'll explain how rolling deployments work, why we prefer them, and the potential downside.
How do rolling deployments work?
In essence, a rolling deployment works similarly to a blue-green deployment except on a smaller basis. Instead of rolling the entire instance out at once, nodes are gradually replaced one by one through continuous integration so that performance isn't sacrificed by having two instances running simultaneously.
Instead of your DevOps team running both entire instances in tandem, you make rolling updates that begin with a single additional running node that contains the new version; you then reconfigure your load balancer so that the new node replaces one of the previous nodes.
For this method to mitigate risk, it is critical that each new node passes your health checks before it replaces the previous node.
However, when you choose to retire the previous node is entirely up to you. You can do so immediately, or you can keep it on standby while you monitor the new node for a period of time—similar to a canary deployment.
This process continues on the backend until every old node in the production environment is replaced with a new node and the desired result is that your users never notice a difference in performance on the front-end.
Why use rolling deployments instead of blue/green or canary deployments?
A rolling deployment eliminates this problem because it requires only a single additional node to be active at a time. As you become more adept at this process, there are tools you can use to replace multiple nodes at once and accelerate the process.
They Lower Costs
When conducting software development updates in a blue/green environment, you need to run both the previous version and the newer version concurrently. As your application scales over time, this process will get slower and more expensive. A rolling deployment eliminates this problem because it requires only a single additional node to be active at a time. As you become more adept at this process, there are tools you can use to replace multiple nodes at once, and accelerate the process . However, unlike the canary environment, a rolling deployment only has one new node that can potentially run into issues instead of an entire set of new pods, which can increase the number of resources needed and the potential for danger in the event of a bug. Additionally, you can customize your rolling deployment strategy to be more dynamic and adapt to environment variables, such as a vulnerability that causes you to switch to a blue-green deployment while conducting a hotfix release.
They Reduce Vulnerability
Another reason why we prefer rolling software deployments is their ability to mitigate the risk of faults in the environment and potential bugs in the system. Similarly to a canary deployment, you can monitor a new node for a period of time before taking the previous node offline to ensure that it is functioning properly. However, unlike the canary environment, a rolling deployment only has one new node that can potentially run into issues instead of an entire set of new pods, which can increase the amount of resources needed and the potential for danger in the event of a bug. With rolling deployments, the impact of instability in a new node is limited to a small percentage of users as opposed to your entire user base.
They Can Be Rolled Back
Speaking of impact to your user base, should those bugs or vulnerabilities occur, you have the ability to rollback to the new node quickly and replace it with the previous node until the issue is remedied. This is why, as we mentioned earlier, health checks are invaluable.
Hopefully, you will have done this and ensured that the new node met your predetermined acceptance criteria before rolling off the old node.
Having the peace of mind knowing that you can perform a rollback at the first sign of trouble is more than enough reason to opt for rolling deployments.
Is there a downside to using rolling deployments?
We wouldn't be doing our job if we didn't explain the potential CONs for using rolling deployments. However, we stand by our advice as we feel the PROs far outweigh the CONs. That being said, let's look at the potential downside of implementing rolling deployments.
1. Complexity
As rolling deployments involve both planning, monitoring, and creation of a rollback plan, there is a fair bit of complexity involved.
DevOps tools like Sleuth can help you to track and monitor your team’s deployment activities and reduce the amount of difficulty involved by instantly alerting relevant personnel when something goes wrong due to a botched deployment.
2. Compatibility
During rolling deployment, your new nodes must maintain compatibility with the ones they're replacing to ensure the system remains stable and optimally functioning. Reverse engineering this process can be tricky.
Again, this is something Sleuth can help you with, by offering anomaly detection across all of your SLIs (uptime, errors, Datadog stats, etc.) and instant notifications so your development team can respond quickly.
3. Notification
Because rolling software deployments can be done with zero downtime if they occur while there are sessions in progress, your users may experience an abrupt shift in functionality without warning. They may not even notice the change. On the other hand, they may be alarmed or experience issues like data loss or decreased application performance.
Sleuth has a solution for that as well, providing the ability to notify your teams when deployments are happening so they can be prepared and save their work ahead of time.
Conclusion
While there are some complexity and potential for complications with rolling deployments, we feel that the amount of risk involved is far less as compared to other popular deployment methods.
Additionally, we feel that rolling software deployments can help you to reduce cost, and significantly reduce the probability of reactive damage control.
Furthermore, rolling deployments are appropriate for all instance shapes and sizes, whether they are composed of large applications or independent nodes, large dedicated servers, or small Docker containers.
And with help from tracking tools like Sleuth, your entire organization will have the granular visibility into the process through our extensive tracking, notification, and in-app features.
Ready to learn more about Sleuth and what it can do to improve your rollout process and make your life easier? Check out our live demo and give Sleuth a try free for 30 days!