Amazon CloudWatch — Monitoring & Alarms

20 minLesson 3 of 5

Learning Objectives

  • Understand CloudWatch metrics and namespaces
  • Create alarms for EC2 and other services
  • Configure CloudWatch Logs for centralized logging
  • Build CloudWatch dashboards

What is CloudWatch?

Amazon CloudWatch is AWS's monitoring and observability service. It collects metrics, logs, and events from AWS resources and applications.

CloudWatch Components

ComponentPurpose
MetricsNumerical data points over time
AlarmsTrigger actions based on thresholds
LogsCentralized log collection
DashboardsVisual monitoring displays
EventsReact to state changes

EC2 Default Metrics

CloudWatch automatically collects these EC2 metrics (5-minute intervals):

MetricDescription
CPUUtilizationCPU usage percentage
NetworkIn/OutNetwork bytes transferred
DiskReadOps/WriteOpsDisk I/O operations
StatusCheckFailedInstance health checks

Enabling Detailed Monitoring

# Enable 1-minute metrics (additional cost)
aws ec2 monitor-instances --instance-ids i-0123456789abcdef0
 
# Verify
aws cloudwatch list-metrics \
  --namespace AWS/EC2 \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0

CloudWatch Alarms

Creating a CPU Alarm

aws cloudwatch put-metric-alarm \
  --alarm-name "HighCPU-nextgen-web" \
  --alarm-description "CPU above 80% for 5 minutes" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

Common Alarm Configurations

# Disk space alarm (requires CloudWatch Agent)
aws cloudwatch put-metric-alarm \
  --alarm-name "LowDisk-nextgen-web" \
  --metric-name disk_used_percent \
  --namespace CWAgent \
  --statistic Average \
  --period 300 \
  --threshold 85 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0
 
# Status check alarm (auto-recover)
aws cloudwatch put-metric-alarm \
  --alarm-name "StatusCheck-nextgen-web" \
  --metric-name StatusCheckFailed \
  --namespace AWS/EC2 \
  --statistic Maximum \
  --period 60 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 3 \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --alarm-actions arn:aws:automate:us-east-1:ec2:recover

CloudWatch Agent

The CloudWatch Agent collects system-level metrics (memory, disk) and custom application logs.

Installation

# Download and install
sudo yum install -y amazon-cloudwatch-agent
 
# Or download directly
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
sudo rpm -U ./amazon-cloudwatch-agent.rpm

Configuration

{
  "metrics": {
    "namespace": "CWAgent",
    "metrics_collected": {
      "mem": {
        "measurement": ["mem_used_percent"],
        "metrics_collection_interval": 60
      },
      "disk": {
        "measurement": ["disk_used_percent"],
        "metrics_collection_interval": 60,
        "resources": ["/", "/data"]
      }
    }
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/messages",
            "log_group_name": "nextgen-system-logs",
            "log_stream_name": "{instance_id}"
          },
          {
            "file_path": "/var/log/nginx/access.log",
            "log_group_name": "nextgen-nginx-access",
            "log_stream_name": "{instance_id}"
          }
        ]
      }
    }
  }
}
# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -c file:/opt/aws/amazon-cloudwatch-agent/etc/config.json \
  -s

CloudWatch Logs

Querying Logs with Insights

-- Find errors in the last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
 
-- Count requests by status code
fields @timestamp, status
| stats count(*) by status
| sort count desc
 
-- Average response time
fields @timestamp, response_time
| stats avg(response_time) as avg_time, max(response_time) as max_time by bin(5m)

CloudWatch Dashboards

aws cloudwatch put-dashboard \
  --dashboard-name "NextGen-Overview" \
  --dashboard-body '{
    "widgets": [
      {
        "type": "metric",
        "properties": {
          "metrics": [
            ["AWS/EC2", "CPUUtilization", "InstanceId", "i-0123456789abcdef0"]
          ],
          "period": 300,
          "stat": "Average",
          "title": "CPU Utilization"
        }
      }
    ]
  }'

SNS Notifications

# Create an SNS topic for alerts
aws sns create-topic --name nextgen-alerts
 
# Subscribe email
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:nextgen-alerts \
  --protocol email \
  --notification-endpoint team@nextgenplayground.org
 
# Use topic ARN in alarm actions

Summary

You've learned:

  • CloudWatch metrics, alarms, and dashboards
  • Monitoring EC2 with default and custom metrics
  • Installing and configuring the CloudWatch Agent
  • Centralized logging with CloudWatch Logs
  • SNS notifications for alert delivery

Next Steps

You now have a solid AWS compute foundation. Combine these skills with your Terraform knowledge to provision and monitor infrastructure as code.