Effective CloudWatch alarming

7 min readMay 9, 2020

This post is not about setting up CloudWatch alarms. You can find it on Google easily. Instead, I am going to talk about how to set up alarms that can make you sleep well at night.

It is my journey of setting up alarms for my company’s website. It may not be a brilliant solution, but I think it is simple enough to give you some insight without considering complex architectures.

Background

My company had a web server that always goes down when there are traffic spikes. The website is not so important that it deserves a dedicated support team nor spending money to revamp it. But it just needs to stay here for people to browse.

Instead of waiting for customers’ report, we need to make sure that when things go wrong, we find them out before our customers do. That’s where the journey starts.

Preparation

To get the state of the webserver, I use CloudWatch log agent to ship Apache access log to CloudWatch, then use metric filter to count the HTTP 5XX error. I am not going to talk about this part. Instead, I want to focus on the alarm strategy. If you are interested, here are the tutorials from AWS:

Installing the CloudWatch Agent Using the Command Line

Installing the CloudWatch Agent Using the Command Line — Amazon CloudWatch Use the following topics to download…

docs.aws.amazon.com

Example: Count HTTP 4xx Codes

As in the previous example, you might want to monitor your web service access logs and monitor the HTTP response code…

docs.aws.amazon.com

Stage 1: Setting up email notification

For starters, the easiest way to set up an alarm is sending an email notification when things go wrong. AWS also know that, so when you create an alarm, the configuration page has a box for you to input your email address. And it creates the SNS topic and subscription automatically.

The default action of CloudWatch alarm is sending email notifications.

Here is my first setup. Whenever the server generates 10+ errors in 15 minutes, the alarm is triggered, and it sends an email to me. Then, it’s my turn to figure out what’s happening and fix it.