How can I make the template debugging process faster, or am I stuck forever noticing my mistakes half an hour after I make them?
Here are a few best-practice suggestions, focusing specifically on improving the iteration speed of complex CloudFormation-template development:
Use CloudFormation tools to validate templates and stack updates
AWS has already outlined these in its own Best Practices document, so I won't repeat them:
The point of this step is to catch obvious syntax or logical errors before actually performing a Stack creation/update.
Test Resources in isolation
Before using any individual CloudFormation Resource in a complex Stack, make sure you thoroughly understand the full extent of that Resource's creation/update/delete behavior, including any limits on usage and typical startup/teardown times, by testing their behavior in smaller, standalone Stacks first.
- If you are developing or using any third-party Custom Resources, write unit tests using appropriate libraries for the language platform, to make sure the application logic behaves as expected across all use-cases.
- Be aware that the amount of time for an individual Resource to create/update/delete can vary widely between Resource Types, depending on the behavior of the underlying API calls. For example, a complex
AWS::CloudFront::Distribution
resource can sometimes take 30-60 minutes to create/update/delete, while an AWS::EC2::SecurityGroup
updates in seconds.
- Individual Resources may have bugs/issues/limitations in their implementation, which are much easier to debug and develop workarounds for when tested in isolation, rather than within a much larger Stack. Keep in mind limitations such as AWS Service Limits depending on your individual AWS Account settings, or Region Availability of services depending on the Region within which you create your Stack.
Build complicated stacks in small increments
When performing a Stack creation/update, a failure in any single Resource will cause the Stack to rollback the entire set of Resource changes, which can unnecessarily destroy other successfully-created Resources and take a very long time when building a complicated stack with a long dependency-graph of associated Resources.
The solution to this is to build your Stack incrementally in smaller Update batches, adding Resources one (or a few) at a time. This way, if/when a failure occurs in a resource creation/update, the rollback doesn't cause your entire Stack's resources to be destroyed, just the set of Resources changed in the latest Update.
Monitor the progress of stack updates
Be sure to Monitor the Progress of your Stack Update by viewing the stack's events while a creation/update is performed. This will be the starting-point for debugging further issues with individual resources.
--disable-rollback
andcfn-signal
? – Dominate