AWS CloudWatch 로깅전략세우기 feat. NestJS

When designing a logging strategy for AWS CloudWatch in a NestJS application—or any web service—there are several categories of information to consider logging. These logs serve multiple purposes: troubleshooting errors, monitoring performance, auditing changes, and even performing analytics. Below are the main categories and details you’ll likely want to capture.


1. Request-Level Information

a. Basic Request Data

b. Request IDs / Correlation IDs

Generate a unique requestId or correlationId for each incoming request, then log it in every subsequent message within that request’s context. This makes it much easier to trace the flow of a particular request across distributed services.

c. Response Time

Record the latency of each request to monitor performance trends and set up alerts in CloudWatch if latencies exceed thresholds.


2. Application-Level Events

이벤트를 로깅하는 것은 좋은 방법이다. 하지만 이벤트브릿지를 활용하여 로깅 말고도 다른 서비스에서 이벤트 정보를 수신하고 활용할 수 있게 만들면 좋을 것 같다.

a. Business Logic Events

b. Success vs. Error Details

Distinguish between normal “happy path” logs and warnings/errors. This might include:

c. Configuration Changes and Deployments

Whenever you deploy new microservices, change environment variables, or update configuration, you can log these events to correlate performance or error spikes with configuration changes.


3. Error and Exception Details

익셉션을 추적하기 위해 로깅을 사용하는 것은 매우 중요하다. 인터셉터를 활용하여 알려진 익셉션들을 클라우드워치로 보내버리자.

a. Stack Traces

When an error happens in NestJS, capture the full stack trace (but be careful not to expose sensitive data). This can be extremely helpful for debugging.

b. Error Metadata

Include metadata such as:

c. Logging Severity Levels

Set up structured logging with levels (e.g., ERROR, WARN, INFO, DEBUG) so that CloudWatch can filter or trigger alerts based on severity.


중요한 API는 직접 Logger를 사용하여 구체적으로 어떤 작업을 요청한건지 추적할 가치가 있다.

a. Authentication & Authorization

b. Privileged Actions and Access

c. Data Privacy Considerations

Avoid logging sensitive details (like passwords, full credit card numbers, personal identifiers). You can hash or partially mask them if necessary.


5. System & Infrastructure Health

a. Resource Utilization

While most of this is captured by AWS itself (e.g., CPU, memory, disk usage via CloudWatch metrics), you might still have some logs at the application level—for example, tracking how many active requests are being processed.

b. External Service Calls

c. Application Startup and Shutdown Logs

Indicate that the NestJS service started successfully or gracefully shut down. This helps correlate downtime or crashed containers with logs.


6. Structured Logging Format

a. JSON Logging

NestJS supports custom logger implementations. Structured logs (often JSON) make it easier to query and analyze logs in CloudWatch. A typical JSON log could look like this:

{
  "timestamp": "2025-02-25T13:45:00.000Z",
  "level": "INFO",
  "requestId": "abc123xyz",
  "message": "User created successfully",
  "userId": 12345
}

b. Consistent Fields

You can standardize fields like:


7. Alerting and Monitoring Best Practices

a. CloudWatch Alarms

Configure alarms on specific metrics, such as:

b. Filters and Insights

c. Automated Alerting

Use Amazon SNS or other alerting mechanisms to receive notifications (Slack, email, SMS) when an alarm threshold is crossed.


8. Compliance and Retention

a. Log Retention

Decide how long you need to retain logs for auditing or compliance (e.g., GDPR, SOC 2, HIPAA if relevant). You can configure CloudWatch log groups with retention policies or export logs to S3 for longer retention.

b. Secure Storage

Ensure sensitive logs are protected. By default, CloudWatch Logs are encrypted at rest, but consider who has access to log groups. Use IAM policies to control log access.

c. Governance and Access Control

Make sure only authorized team members can read or write logs. Implement role-based access to keep logs secure.


Putting It All Together

A typical AWS CloudWatch-based logging system for NestJS might look like this:

  1. NestJS Logger / Winston / Pino: Use a logging library that supports JSON output and multiple log levels.
  2. Log Every Request: An interceptor that captures request/response data, timestamps, latencies, and correlation IDs.
  3. Log Application Events: For example, custom events in your services or domain logic, with the relevant metadata.
  4. Handle Exceptions Globally: A NestJS ExceptionFilter that logs error details and stack traces in a structured format.
  5. Send Logs to CloudWatch: Configure your container/EC2/Elastic Beanstalk/ECS tasks to forward logs automatically to CloudWatch Logs.
  6. Set Up Metric Filters: Convert important log patterns (like ERROR or specific event types) into CloudWatch metrics.
  7. Create Alarms: Trigger notifications if error rates spike or if average response times exceed a certain threshold.

By carefully planning out the above, you’ll have a robust, clear, and secure logging strategy that makes debugging, performance tuning, and auditing much simpler.