aws

AWS Monitoring, Audit and Performance

ayleeee 2024. 4. 14. 11:33

Amazon CloudWatch Metrics

  • Metric is a variable to monitor
    • belong to namespaces
    • Dimension is an attribute of a metric
      • Instance Id, environment
      • up to 30 dimensions per metric
    • have timestamps
  • Can create CloudWatch dashboards of metrics
  • Can create CloudWatsh Custom Metrics

Amazon CloudWatch Metric Streams

  • Continually stream CloudWatch metrics to a destination of your choice, with near-real-time delivery and low latency
  • Option to filter metircs to only stream a subset of them

Amazon CloudWatch Logs

  • Log groups
    • arbitrary name
  • Log stream
    • instances within application / log files / containers
  • Can define log expiration polices
  • CloudWatch Logs can send logs to :
    • Amazon S3, Kinesis Data Streams, Kinesis Data Firehose, AWS Lambda, OpenSearch
    • S3 Export
      • Log data can take up to 12 hours to become available for export
      • The API call is CreateExportTask
      • Not near-real time or real-time
        • Instead, use Logs Subscriptions
          • Get a real-time log events from CloudWatch Logs for processing and analysis
          • Send to Kinesis Data Streams, Kinesis Data Firehose, or Lambda
          • Subscription Filter - filter which logs are evenets delivered to your destination
          • Cross-Account Subscription - send log events to resources in a different AWS account 
  • Logs are encrypted by default
  • Can setup KMS-based encryption with your own keys
  • Logs for EC2
    • By default, no logs from your EC2 machine will go to CloudWatch
    • You need to run a CloudWatch agent on EC2 to push the log files you want
    • Make sure IAM permissions are correct
    • The CloudWatch log agent can be setup on-premises too
  • Logs Agent & Unified Agent
    • For virtual servers
    • CloudWatch Logs Agent
      • Old version of the agent
      • Can only send to CloudWatch Logs
    • CloudWatch Unified Agent
      • Collect additional system-level metrics such as RAM, processes, etc
        • Collected directly on your Linux server / EC2 instance
      • Collect logs to send to CloudWatch Logs
      • Centralized configuration using SSM Parameter Store

Amazon CloudWatch Alarms

  • Alarms are used to trigger notifications for any metric
  • Various options (sampling, % ...)
  • Alarm States:
    • OK
    • INSUFFICIENT_DATA
    • ALARM
  • Period:
    • Length of time in seconds to evalute the metric
    • High resolution custom metrics
      • 10 sec, 30 sec or multiples of 60 sec
  • Targets
    • Stop, Termincate, Reboot, or Recover an EC2 Instance
    • Trigger Auto Scaling Action
    • Send notification to SNS
  • Composite Alarms
    • CloudWatch Alarms are on a single metric
    • Composite Alarms are monitoring the states of multiple other alarms
    • AND and OR conditions
    • Helpful to reduce "alarm noise" by creating complex composite alarms
  • EC2 Instance Recovery
    • Status Check
      • Instance status = check the EC2 VM
      • System status = check the underlying hardware
    • Recovery
      • Same Private, Public, Elastic IP, metadata, placement group
  • Alarms can be created based on CloudWatch Logs Metrics Filter
  • To test alarms and notifications, set the alarm state to Alarm using CLI
  • Container Insights
    • Collect, aggregate, summarize metrics and logs from containers
    • Available for containers on : 
      • Amazon ECS
      • Amazon EKS
      • Kubernetes platforms on EC2
      • Fargate (both for ECS and EKS)
    • In Amazon EKS and Kubernetes, CloudWatch Insight is using a containerized version of the CloudWatch Agetn to discover containers
  • Lambda Insights
    • Monitoring and troubleshooting solution for serverless applications running on AWS Lambda
    • Collects, aggregates, and summarizes system-level metrics including CPU time, memory, disk and network
    • Collects, aggregates, and summarizes diagnostic information such as cold starts and Lambda worker shutdowns
    • Lambda Insights is provided as a Lambda Layer
  • Application Insights
    • Provides automated dashboards that show potential problems with monitored applications, to help isolate ongoing issues
    • Enhanced visibility into your application health to reduce the time it will take you to troubleshoot and repair your applications
    • Findings and alerts are sent to Amazon EventBridge and SSM OpsCenter
  • Insights and Operational Visibility
    • CloudWatch Container Insights
    • CloudWatch Lambda Insights
    • CloudWatch Contributors Insights
    • CloudWatch Application Insights

Amazon EventBridge

  • Schedule : Cron jobs (scheduled scripts)
  • Event Pattern : Event rules to react to a service doing something
  • Trigger Lambda functions, send SQS/SNS messages
  • Event buses can be accessed by other AWS accounts using Resource-base Policies
  • You can archive events sent to an event bus
  • Ability to reply archived events
  • EventBridge can analyze the events in your bus and infer the schema
    • The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
    • Schema can be versioned
  • Resource-based Policy
    • Manage permissions for a specific Event Bus
    • Example : allow/deny events from another AWS account or AWS region
    • Use Case : aggregate all events from your AWS Organization in a single AWS account or AWS region

AWS CloudTrail

  • Provides governance, compliance and audit for your AWS Account
  • CloudTrail is enabled by default
  • Get an history of events / API calls made within your AWS Account by :
    • Console, SDK, CLI, AWS Services
  • Can put logs from CloudTrail into CloudWatch Logs or S3
  • A trail can be applied to All Regions(default) or a single Region
  • If a resource is deleted in AWS, investigate CloudTrail first!

CloudTrail Events

  • Management Events:
    • Operations that are performed on resources in your AWS account
    • By default, trails are configured to log management events
    • Can separate Read Events from Write Events
  • Data Events:
    • By default, data events are not logged (because high volume operations)
    • Amazon S3 object-level activity : can separate Read and Write events
    • AWS Lambda function execution activity
  • CloudTrail Insights
    • Enable CloudTrail insights to detect unusual activity in your account:
      • inaccurate resource provisioning
      • hitting service limits
      • bursts of AWS IAM actions
      • gaps in periodic maintenance activity
    • CloudTrail Insights analyzes normal management events to create a baseline
    • And then continously analyzes write events to detect unusual patterns
      • Anomalies appear in the CloudTrail console
      • Event is sent to S3
      • An EventBridge event is generated
  • CloudTrail Events Retention
    • Events are stored for 90 days in CloudTrail
    • To keep events beyond this period, log them to S3 and use Athena

AWS Config

  • Helps with auditing and recording compliance of your AWS resources
  • Helps record configurations and changes over time
  • Can receive alerts for any changes
  • AWS Config is a per-region service
  • Can be aggregated across regions and accounts
  • Possibility of storing the configuration data into S3 (analyzed by Athena)

Config Rules

  • Can use AWS managed config rules
  • Can make custom config rules (must be defined in AWS Lambda)
  • Rules can be evaluated / triggered :
    • For each config change
    • And / or : at regular time intervals
  • AWS Config Rules does not prevent actions from happening (no deny)
  • Remediation (복원/교정)
    • Automate remediation of non-compliant resources using SSM Automation Documents
    • Use AWS-Managed Automation Documents or create custom Automation Documents
    • You can set Remediation Retries if the resource is still non-compliant after auto-remediation
  • Notifications
    • Use EventBridge to trigger notifications when AWS resources are non-compliant
    • Ability to send configuration changes and compliance state notifications to SNS

CloudWatch vs CloudTrail vs Config

  • CloudWatch
    • Performance monitoring & dashboards
    • Event & Alerting
    • Log Aggrgation & Analysis
  • CloudTrail
    • Record API calls made within your Account by everyone
    • Can define trails for specific resources
    • Global Service
  • Config
    • Record configuration changes
    • Evaluate resources against compliance rules
    • Get timeline of changes and compliance

For an Elastic Load Balancer

  • CloudWatch
    • Monitoring Incoming connections metric
    • Visualize error codes as % over time
    • Make a dashboard to get an idea of your load balancer performance
  • Config
    • Track security group rules for the Load Balancer
    • Track configuration changes for the Load Balancer
    • Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
  • CloudTrail
    • Track who made any changes to the Load Balancer with API calls

'aws' 카테고리의 다른 글

AWS Security & Encryption  (1) 2024.04.18
Advanced Identity in AWS  (0) 2024.04.17
Machine Learning  (1) 2024.04.10
Data & Analytics  (1) 2024.04.08
Databases in AWS  (3) 2024.04.07