Bash + AWS CLI: Automatización de Infraestructura

📋 Contexto

Bash + AWS CLI es una combinación poderosa para automatizar infraestructura. No necesitas Terraform o CloudFormation para todo: muchas tareas son más rápidas y simples con un buen script bash. En este post comparto patrones y scripts que uso constantemente.

🎯 Por Qué Bash + AWS CLI

Ventajas

Rápido de escribir: Script funcional en minutos
Sin dependencias: Solo necesitas AWS CLI instalado
Debugging simple: set -x y ves exactamente qué se ejecuta
Portátil: Corre en cualquier Linux/Mac
Ideal para tareas ad-hoc: Migraciones, cleanups, reportes

Cuándo Usar Bash vs Terraform

Bash: Tareas operativas, scripts de mantenimiento, reportes, migraciones
Terraform: Infraestructura declarativa, state management, múltiples entornos

🛠️ Patrón 1: Inventario de Recursos

Script: Listar Instancias EC2 con Detalles

#!/bin/bash
# ec2-inventory.sh - Lista todas las instancias EC2 con información útil

set -euo pipefail

readonly REGIONS=(
    us-east-1
    us-west-2
    eu-west-1
)

log() {
    echo "[$(date +'%H:%M:%S')] $*"
}

get_instances_in_region() {
    local region=$1
    
    log "Scanning region: $region"
    
    aws ec2 describe-instances \
        --region "$region" \
        --query 'Reservations[].Instances[].[
            InstanceId,
            InstanceType,
            State.Name,
            Tags[?Key==`Name`].Value | [0],
            Tags[?Key==`Environment`].Value | [0],
            PrivateIpAddress,
            PublicIpAddress,
            LaunchTime
        ]' \
        --output text | \
    while IFS=$'\t' read -r id type state name env private_ip public_ip launch_time; do
        # Skip if no instances found
        [ -z "$id" ] && continue
        
        # Calculate age
        local age_days=$(( ($(date +%s) - $(date -d "$launch_time" +%s)) / 86400 ))
        
        printf "%-20s %-15s %-10s %-25s %-12s %s\n" \
            "$id" \
            "$type" \
            "$state" \
            "${name:-N/A}" \
            "${env:-N/A}" \
            "${age_days}d old"
    done
}

main() {
    echo "================================================================"
    echo "  EC2 Inventory Report - $(date +'%Y-%m-%d')"
    echo "================================================================"
    echo ""
    
    printf "%-20s %-15s %-10s %-25s %-12s %s\n" \
        "INSTANCE_ID" "TYPE" "STATE" "NAME" "ENVIRONMENT" "AGE"
    echo "----------------------------------------------------------------"
    
    for region in "${REGIONS[@]}"; do
        get_instances_in_region "$region"
    done
    
    echo ""
    log "✓ Inventory complete"
}

main "$@"

🛠️ Patrón 2: Operaciones Masivas

Script: Detener Instancias No Productivas Automáticamente

#!/bin/bash
# stop-non-prod.sh - Detiene instancias dev/staging fuera de horario laboral

set -euo pipefail

readonly BUSINESS_HOURS_START=8
readonly BUSINESS_HOURS_END=18

is_business_hours() {
    local current_hour=$(date +%H)
    
    [ "$current_hour" -ge "$BUSINESS_HOURS_START" ] && \
    [ "$current_hour" -lt "$BUSINESS_HOURS_END" ]
}

get_stoppable_instances() {
    aws ec2 describe-instances \
        --filters \
            "Name=tag:Environment,Values=dev,staging" \
            "Name=instance-state-name,Values=running" \
            "Name=tag:AutoStop,Values=enabled" \
        --query 'Reservations[].Instances[].[InstanceId,Tags[?Key==`Name`].Value|[0]]' \
        --output text
}

stop_instances() {
    local instances=("$@")
    
    if [ ${#instances[@]} -eq 0 ]; then
        log "No instances to stop"
        return 0
    fi
    
    log "Stopping ${#instances[@]} instances..."
    
    aws ec2 stop-instances --instance-ids "${instances[@]}" --output json | \
        jq -r '.StoppingInstances[] | "\(.InstanceId): \(.PreviousState.Name) → \(.CurrentState.Name)"'
}

main() {
    log "Checking if we're outside business hours..."
    
    if is_business_hours; then
        log "It's business hours (8 AM - 6 PM), skipping auto-stop"
        exit 0
    fi
    
    log "Outside business hours, proceeding with auto-stop"
    
    local instances=()
    while IFS=$'\t' read -r instance_id name; do
        [ -z "$instance_id" ] && continue
        
        log "Found: $instance_id ($name)"
        instances+=("$instance_id")
    done < <(get_stoppable_instances)
    
    if [ ${#instances[@]} -gt 0 ]; then
        stop_instances "${instances[@]}"
        log "✓ Stopped ${#instances[@]} instances"
    else
        log "✓ No instances to stop"
    fi
}

main "$@"

💡 Tip: Ejecuta este script con cron a las 6 PM:
0 18 * * * /path/to/stop-non-prod.sh

🛠️ Patrón 3: Validación y Compliance

Script: Auditar Security Groups Abiertos

#!/bin/bash
# audit-security-groups.sh - Detecta security groups con reglas inseguras

set -euo pipefail

readonly DANGEROUS_PORTS=(22 3389 3306 5432 6379 27017)

find_open_security_groups() {
    aws ec2 describe-security-groups \
        --query 'SecurityGroups[].[GroupId,GroupName,IpPermissions]' \
        --output json | \
    jq -r '.[] | 
        select(.[2][]? | 
            .IpRanges[]?.CidrIp == "0.0.0.0/0" and 
            (.FromPort // 0) <= 65535
        ) | 
        "\(.[0])\t\(.[1])"'
}

check_security_group() {
    local group_id=$1
    local group_name=$2
    
    local violations=$(aws ec2 describe-security-groups \
        --group-ids "$group_id" \
        --query 'SecurityGroups[0].IpPermissions[]' \
        --output json | \
    jq -r '.[] | 
        select(.IpRanges[]?.CidrIp == "0.0.0.0/0") | 
        "  Port \(.FromPort // "ALL") → 0.0.0.0/0 (PUBLIC)"')
    
    if [ -n "$violations" ]; then
        echo ""
        echo "🚨 $group_name ($group_id):"
        echo "$violations"
        return 1
    fi
    
    return 0
}

main() {
    echo "================================================================"
    echo "  Security Groups Audit - $(date +'%Y-%m-%d')"
    echo "================================================================"
    
    local violations_found=0
    
    while IFS=$'\t' read -r group_id group_name; do
        [ -z "$group_id" ] && continue
        
        if ! check_security_group "$group_id" "$group_name"; then
            ((violations_found++))
        fi
    done < <(find_open_security_groups)
    
    echo ""
    echo "================================================================"
    if [ $violations_found -eq 0 ]; then
        echo "✅ No violations found"
    else
        echo "❌ Found $violations_found security group(s) with violations"
        exit 1
    fi
}

main "$@"

🛠️ Patrón 4: Cost Optimization

Script: Identificar Recursos Sin Usar

#!/bin/bash
# find-unused-resources.sh - Detecta recursos que generan costos innecesarios

set -euo pipefail

find_unattached_ebs_volumes() {
    echo "📦 Unattached EBS Volumes:"
    echo ""
    
    aws ec2 describe-volumes \
        --filters "Name=status,Values=available" \
        --query 'Volumes[].[VolumeId,Size,VolumeType,CreateTime,Tags[?Key==`Name`].Value|[0]]' \
        --output text | \
    while IFS=$'\t' read -r vol_id size type created name; do
        local age_days=$(( ($(date +%s) - $(date -d "$created" +%s)) / 86400 ))
        local monthly_cost=$(echo "$size * 0.10" | bc)  # ~$0.10/GB for GP3
        
        printf "  %s (%s) - %dGB %s - Created %dd ago - ~\$%.2f/month\n" \
            "$vol_id" \
            "${name:-No Name}" \
            "$size" \
            "$type" \
            "$age_days" \
            "$monthly_cost"
    done
}

find_unused_elastic_ips() {
    echo ""
    echo "🌐 Unassociated Elastic IPs:"
    echo ""
    
    aws ec2 describe-addresses \
        --query 'Addresses[?AssociationId==`null`].[PublicIp,AllocationId]' \
        --output text | \
    while IFS=$'\t' read -r ip allocation_id; do
        echo "  $ip ($allocation_id) - \$3.60/month wasted"
    done
}

find_old_snapshots() {
    echo ""
    echo "💾 Old Snapshots (>90 days):"
    echo ""
    
    local cutoff_date=$(date -d '90 days ago' +%Y-%m-%d)
    
    aws ec2 describe-snapshots \
        --owner-ids self \
        --query "Snapshots[?StartTime<'$cutoff_date'].[SnapshotId,VolumeSize,StartTime,Description]" \
        --output text | \
    while IFS=$'\t' read -r snap_id size start_time desc; do
        local age_days=$(( ($(date +%s) - $(date -d "$start_time" +%s)) / 86400 ))
        local monthly_cost=$(echo "$size * 0.05" | bc)  # ~$0.05/GB for snapshots
        
        printf "  %s - %dGB - %dd old - ~\$%.2f/month\n" \
            "$snap_id" \
            "$size" \
            "$age_days" \
            "$monthly_cost"
    done
}

find_stopped_instances() {
    echo ""
    echo "🛑 Stopped Instances (still incurring EBS costs):"
    echo ""
    
    aws ec2 describe-instances \
        --filters "Name=instance-state-name,Values=stopped" \
        --query 'Reservations[].Instances[].[InstanceId,InstanceType,StateTransitionReason,Tags[?Key==`Name`].Value|[0]]' \
        --output text | \
    while IFS=$'\t' read -r id type reason name; do
        # Extract stop date from reason
        local stopped_date=$(echo "$reason" | grep -oP '\(\d{4}-\d{2}-\d{2}' | tr -d '(')
        
        if [ -n "$stopped_date" ]; then
            local days_stopped=$(( ($(date +%s) - $(date -d "$stopped_date" +%s)) / 86400 ))
            printf "  %s (%s) - %s - Stopped %dd ago\n" \
                "$id" \
                "${name:-No Name}" \
                "$type" \
                "$days_stopped"
        fi
    done
}

main() {
    echo "================================================================"
    echo "  Unused Resources Report - $(date +'%Y-%m-%d')"
    echo "================================================================"
    echo ""
    
    find_unattached_ebs_volumes
    find_unused_elastic_ips
    find_old_snapshots
    find_stopped_instances
    
    echo ""
    echo "================================================================"
    echo "💡 Consider cleaning up these resources to reduce costs"
    echo "================================================================"
}

main "$@"

🛠️ Patrón 5: Deployment Automation

Script: Blue/Green Deployment Simple

#!/bin/bash
# blue-green-deploy.sh - Simple blue/green deployment for EC2 + ALB

set -euo pipefail

readonly ALB_NAME="my-app-alb"
readonly ASG_BLUE="my-app-asg-blue"
readonly ASG_GREEN="my-app-asg-green"

get_active_asg() {
    # Determine which ASG is currently receiving traffic
    local blue_capacity=$(aws autoscaling describe-auto-scaling-groups \
        --auto-scaling-group-names "$ASG_BLUE" \
        --query 'AutoScalingGroups[0].DesiredCapacity' \
        --output text)
    
    [ "$blue_capacity" -gt 0 ] && echo "$ASG_BLUE" || echo "$ASG_GREEN"
}

get_inactive_asg() {
    local active=$(get_active_asg)
    [ "$active" = "$ASG_BLUE" ] && echo "$ASG_GREEN" || echo "$ASG_BLUE"
}

deploy_to_asg() {
    local asg=$1
    local ami_id=$2
    
    log "Updating launch template with new AMI: $ami_id"
    
    # Update launch template
    aws ec2 create-launch-template-version \
        --launch-template-name "$asg" \
        --source-version '$Latest' \
        --launch-template-data "{\"ImageId\":\"$ami_id\"}"
    
    log "Scaling up $asg..."
    
    # Scale up the inactive ASG
    aws autoscaling update-auto-scaling-group \
        --auto-scaling-group-name "$asg" \
        --desired-capacity 2 \
        --min-size 2
    
    log "Waiting for instances to be healthy..."
    
    # Wait for instances to be InService
    local max_wait=600  # 10 minutes
    local elapsed=0
    
    while [ $elapsed -lt $max_wait ]; do
        local healthy=$(aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-names "$asg" \
            --query 'AutoScalingGroups[0].Instances[?HealthStatus==`Healthy`] | length(@)' \
            --output text)
        
        if [ "$healthy" -ge 2 ]; then
            log "✓ $asg has $healthy healthy instances"
            return 0
        fi
        
        log "Waiting... ($healthy/2 healthy)"
        sleep 30
        elapsed=$((elapsed + 30))
    done
    
    error "Timeout waiting for healthy instances"
    return 1
}

switch_traffic() {
    local new_asg=$1
    local old_asg=$2
    
    log "Switching traffic from $old_asg to $new_asg..."
    
    # In a real scenario, you'd update ALB target group
    # For this example, we just scale down the old ASG
    
    aws autoscaling update-auto-scaling-group \
        --auto-scaling-group-name "$old_asg" \
        --desired-capacity 0 \
        --min-size 0
    
    log "✓ Traffic switched successfully"
}

main() {
    local new_ami_id=${1:?AMI ID required}
    
    log "Starting blue/green deployment..."
    
    local active_asg=$(get_active_asg)
    local inactive_asg=$(get_inactive_asg)
    
    log "Active ASG: $active_asg"
    log "Deploying to: $inactive_asg"
    
    if deploy_to_asg "$inactive_asg" "$new_ami_id"; then
        switch_traffic "$inactive_asg" "$active_asg"
        log "✅ Deployment complete"
    else
        error "❌ Deployment failed"
        exit 1
    fi
}

main "$@"

💡 Mejores Prácticas

1. Siempre Usa Error Handling

# Mandatory header
set -euo pipefail
IFS=$'\n\t'

2. Logging Consistente

log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"; }
error() { echo "[ERROR] $*" >&2; }

3. Dry-Run Support

readonly DRY_RUN=${DRY_RUN:-0}

execute() {
    if [ "$DRY_RUN" = "1" ]; then
        echo "[DRY-RUN] Would execute: $*"
    else
        "$@"
    fi
}

# Usage
execute aws ec2 stop-instances --instance-ids i-xxx

4. Validación de Prerequisites

check_prerequisites() {
    command -v aws >/dev/null || { error "AWS CLI not found"; exit 1; }
    command -v jq >/dev/null || { error "jq not found"; exit 1; }
    aws sts get-caller-identity &>/dev/null || { error "Not authenticated"; exit 1; }
}

✅ Checklist para Scripts Production-Ready:
• Error handling (set -euo pipefail)
• Logging con timestamps
• Validación de inputs
• Dry-run mode
• Help message (--help)
• Exit codes apropiados
• Comentarios en secciones complejas

📚 Recursos Útiles

💭 Conclusión

Bash + AWS CLI es una combinación poderosa para automatización rápida y efectiva. No necesitas frameworks complejos para muchas tareas. Con buenos patrones, error handling robusto y validaciones, puedes crear scripts confiables que ahorren horas de trabajo manual.

Los scripts en este post son ejemplos reales que uso (con adaptaciones). Úsalos como base para tus propias automatizaciones.