AWS CloudWatch 로깅전략세우기 feat. NestJS

curinginnos PLAN

Sentry 다시 되살릴 수 있는지 리서치, 유용하다면 Sentry 사용하는 것으로 만족. 만족할 수 없는 경우, CloudWatch 도입

AI Response :: Chat GPT o1

When designing a logging strategy for AWS CloudWatch in a NestJS application—or any web service—there are several categories of information to consider logging. These logs serve multiple purposes: troubleshooting errors, monitoring performance, auditing changes, and even performing analytics. Below are the main categories and details you’ll likely want to capture.

1. Request-Level Information

a. Basic Request Data

HTTP Method and Path: For example, GET /users, POST /orders.
Status Code: Helps filter successful vs. failing requests.
Timestamp: When the request was received/completed.
User Agent: Useful when investigating issues specific to certain browsers or clients.
Client IP / Request Origin: Aids in tracing back suspicious or high-traffic sources.
Query Parameters / Route Parameters: Might be needed for debugging, but be mindful of sensitive data.

b. Request IDs / Correlation IDs

Generate a unique requestId or correlationId for each incoming request, then log it in every subsequent message within that request’s context. This makes it much easier to trace the flow of a particular request across distributed services.

c. Response Time

Record the latency of each request to monitor performance trends and set up alerts in CloudWatch if latencies exceed thresholds.

2. Application-Level Events

이벤트를 로깅하는 것은 좋은 방법이다. 하지만 이벤트브릿지를 활용하여 로깅 말고도 다른 서비스에서 이벤트 정보를 수신하고 활용할 수 있게 만들면 좋을 것 같다.

a. Business Logic Events

Key Domain Events: For example, UserCreated, OrderShipped, PaymentProcessed.
Lifecycle Hooks: For instance, NestJS lifecycle events such as onModuleInit, onModuleDestroy, or Cron job starts/finishes.

b. Success vs. Error Details

Distinguish between normal “happy path” logs and warnings/errors. This might include:

Success Path Messages (e.g., “Successfully created user ID: 12345”).
Warning Messages (e.g., “Retrying payment due to network hiccup”).
Error Messages (e.g., “Unable to connect to DB: connection refused”).

c. Configuration Changes and Deployments

Whenever you deploy new microservices, change environment variables, or update configuration, you can log these events to correlate performance or error spikes with configuration changes.

3. Error and Exception Details

익셉션을 추적하기 위해 로깅을 사용하는 것은 매우 중요하다. 인터셉터를 활용하여 알려진 익셉션들을 클라우드워치로 보내버리자.

a. Stack Traces

When an error happens in NestJS, capture the full stack trace (but be careful not to expose sensitive data). This can be extremely helpful for debugging.

b. Error Metadata

Include metadata such as:

Error Code or Exception Type.
Affected Module/Service/Class.
User ID or Session ID (if applicable).

c. Logging Severity Levels

Set up structured logging with levels (e.g., ERROR, WARN, INFO, DEBUG) so that CloudWatch can filter or trigger alerts based on severity.

중요한 API는 직접 Logger를 사용하여 구체적으로 어떤 작업을 요청한건지 추적할 가치가 있다.

a. Authentication & Authorization

Login Attempts: Track successful and failed login attempts.
JWT Token Validation: Log warnings or errors for invalid tokens.

b. Privileged Actions and Access

Key Admin Actions: Deleting or modifying important records, changing user roles, etc.
Suspicious Activity: Repeated failures, unusual request patterns.

c. Data Privacy Considerations

Avoid logging sensitive details (like passwords, full credit card numbers, personal identifiers). You can hash or partially mask them if necessary.

5. System & Infrastructure Health

a. Resource Utilization

While most of this is captured by AWS itself (e.g., CPU, memory, disk usage via CloudWatch metrics), you might still have some logs at the application level—for example, tracking how many active requests are being processed.

b. External Service Calls

API Calls to external services, along with response times and statuses.
Database Query Performance if you want more detailed or application-level logging than what typical AWS RDS or external DB logs provide.

c. Application Startup and Shutdown Logs

Indicate that the NestJS service started successfully or gracefully shut down. This helps correlate downtime or crashed containers with logs.

6. Structured Logging Format

a. JSON Logging

NestJS supports custom logger implementations. Structured logs (often JSON) make it easier to query and analyze logs in CloudWatch. A typical JSON log could look like this:

{
  "timestamp": "2025-02-25T13:45:00.000Z",
  "level": "INFO",
  "requestId": "abc123xyz",
  "message": "User created successfully",
  "userId": 12345
}

b. Consistent Fields

You can standardize fields like:

timestamp
level (e.g., DEBUG, INFO, WARN, ERROR)
message
context (a NestJS-specific concept for tagging logs with the name of the service or module)
requestId / correlationId
environment (e.g., production, staging, local)

7. Alerting and Monitoring Best Practices

a. CloudWatch Alarms

Configure alarms on specific metrics, such as:

Error Count or Error Rate.
Latency / Response Times.
High CPU or Memory usage on containers/EC2 instances.

b. Filters and Insights

Metric Filters: Create CloudWatch metrics based on log patterns (e.g., every time you see “ERROR”).
CloudWatch Logs Insights: A powerful tool for querying your logs. If you log in structured JSON, you can search logs by fields like userId, context, or errorCode.

c. Automated Alerting

Use Amazon SNS or other alerting mechanisms to receive notifications (Slack, email, SMS) when an alarm threshold is crossed.

8. Compliance and Retention

a. Log Retention

Decide how long you need to retain logs for auditing or compliance (e.g., GDPR, SOC 2, HIPAA if relevant). You can configure CloudWatch log groups with retention policies or export logs to S3 for longer retention.

b. Secure Storage

Ensure sensitive logs are protected. By default, CloudWatch Logs are encrypted at rest, but consider who has access to log groups. Use IAM policies to control log access.

c. Governance and Access Control

Make sure only authorized team members can read or write logs. Implement role-based access to keep logs secure.

Putting It All Together

A typical AWS CloudWatch-based logging system for NestJS might look like this:

NestJS Logger / Winston / Pino: Use a logging library that supports JSON output and multiple log levels.
Log Every Request: An interceptor that captures request/response data, timestamps, latencies, and correlation IDs.
Log Application Events: For example, custom events in your services or domain logic, with the relevant metadata.
Handle Exceptions Globally: A NestJS ExceptionFilter that logs error details and stack traces in a structured format.
Send Logs to CloudWatch: Configure your container/EC2/Elastic Beanstalk/ECS tasks to forward logs automatically to CloudWatch Logs.
Set Up Metric Filters: Convert important log patterns (like ERROR or specific event types) into CloudWatch metrics.
Create Alarms: Trigger notifications if error rates spike or if average response times exceed a certain threshold.

By carefully planning out the above, you’ll have a robust, clear, and secure logging strategy that makes debugging, performance tuning, and auditing much simpler.