AWS CloudWatch 로깅전략세우기 feat. NestJS
When designing a logging strategy for AWS CloudWatch in a NestJS application—or any web service—there are several categories of information to consider logging. These logs serve multiple purposes: troubleshooting errors, monitoring performance, auditing changes, and even performing analytics. Below are the main categories and details you’ll likely want to capture.
1. Request-Level Information
a. Basic Request Data
- HTTP Method and Path: For example,
GET /users
,POST /orders
. - Status Code: Helps filter successful vs. failing requests.
- Timestamp: When the request was received/completed.
- User Agent: Useful when investigating issues specific to certain browsers or clients.
- Client IP / Request Origin: Aids in tracing back suspicious or high-traffic sources.
- Query Parameters / Route Parameters: Might be needed for debugging, but be mindful of sensitive data.
b. Request IDs / Correlation IDs
Generate a unique requestId
or correlationId
for each incoming request, then log it in every subsequent message within that request’s context. This makes it much easier to trace the flow of a particular request across distributed services.
c. Response Time
Record the latency of each request to monitor performance trends and set up alerts in CloudWatch if latencies exceed thresholds.
2. Application-Level Events
이벤트를 로깅하는 것은 좋은 방법이다. 하지만 이벤트브릿지를 활용하여 로깅 말고도 다른 서비스에서 이벤트 정보를 수신하고 활용할 수 있게 만들면 좋을 것 같다.
a. Business Logic Events
- Key Domain Events: For example,
UserCreated
,OrderShipped
,PaymentProcessed
. - Lifecycle Hooks: For instance, NestJS lifecycle events such as onModuleInit, onModuleDestroy, or Cron job starts/finishes.
b. Success vs. Error Details
Distinguish between normal “happy path” logs and warnings/errors. This might include:
- Success Path Messages (e.g., “Successfully created user ID: 12345”).
- Warning Messages (e.g., “Retrying payment due to network hiccup”).
- Error Messages (e.g., “Unable to connect to DB: connection refused”).
c. Configuration Changes and Deployments
Whenever you deploy new microservices, change environment variables, or update configuration, you can log these events to correlate performance or error spikes with configuration changes.
3. Error and Exception Details
익셉션을 추적하기 위해 로깅을 사용하는 것은 매우 중요하다. 인터셉터를 활용하여 알려진 익셉션들을 클라우드워치로 보내버리자.
a. Stack Traces
When an error happens in NestJS, capture the full stack trace (but be careful not to expose sensitive data). This can be extremely helpful for debugging.
b. Error Metadata
Include metadata such as:
- Error Code or Exception Type.
- Affected Module/Service/Class.
- User ID or Session ID (if applicable).
c. Logging Severity Levels
Set up structured logging with levels (e.g., ERROR
, WARN
, INFO
, DEBUG
) so that CloudWatch can filter or trigger alerts based on severity.
4. Security-Related Events
중요한 API는 직접 Logger를 사용하여 구체적으로 어떤 작업을 요청한건지 추적할 가치가 있다.
a. Authentication & Authorization
- Login Attempts: Track successful and failed login attempts.
- JWT Token Validation: Log warnings or errors for invalid tokens.
b. Privileged Actions and Access
- Key Admin Actions: Deleting or modifying important records, changing user roles, etc.
- Suspicious Activity: Repeated failures, unusual request patterns.
c. Data Privacy Considerations
Avoid logging sensitive details (like passwords, full credit card numbers, personal identifiers). You can hash or partially mask them if necessary.
5. System & Infrastructure Health
a. Resource Utilization
While most of this is captured by AWS itself (e.g., CPU, memory, disk usage via CloudWatch metrics), you might still have some logs at the application level—for example, tracking how many active requests are being processed.
b. External Service Calls
- API Calls to external services, along with response times and statuses.
- Database Query Performance if you want more detailed or application-level logging than what typical AWS RDS or external DB logs provide.
c. Application Startup and Shutdown Logs
Indicate that the NestJS service started successfully or gracefully shut down. This helps correlate downtime or crashed containers with logs.
6. Structured Logging Format
a. JSON Logging
NestJS supports custom logger implementations. Structured logs (often JSON) make it easier to query and analyze logs in CloudWatch. A typical JSON log could look like this:
{
"timestamp": "2025-02-25T13:45:00.000Z",
"level": "INFO",
"requestId": "abc123xyz",
"message": "User created successfully",
"userId": 12345
}
b. Consistent Fields
You can standardize fields like:
- timestamp
- level (e.g., DEBUG, INFO, WARN, ERROR)
- message
- context (a NestJS-specific concept for tagging logs with the name of the service or module)
- requestId / correlationId
- environment (e.g.,
production
,staging
,local
)
7. Alerting and Monitoring Best Practices
a. CloudWatch Alarms
Configure alarms on specific metrics, such as:
- Error Count or Error Rate.
- Latency / Response Times.
- High CPU or Memory usage on containers/EC2 instances.
b. Filters and Insights
- Metric Filters: Create CloudWatch metrics based on log patterns (e.g., every time you see “ERROR”).
- CloudWatch Logs Insights: A powerful tool for querying your logs. If you log in structured JSON, you can search logs by fields like
userId
,context
, orerrorCode
.
c. Automated Alerting
Use Amazon SNS or other alerting mechanisms to receive notifications (Slack, email, SMS) when an alarm threshold is crossed.
8. Compliance and Retention
a. Log Retention
Decide how long you need to retain logs for auditing or compliance (e.g., GDPR, SOC 2, HIPAA if relevant). You can configure CloudWatch log groups with retention policies or export logs to S3 for longer retention.
b. Secure Storage
Ensure sensitive logs are protected. By default, CloudWatch Logs are encrypted at rest, but consider who has access to log groups. Use IAM policies to control log access.
c. Governance and Access Control
Make sure only authorized team members can read or write logs. Implement role-based access to keep logs secure.
Putting It All Together
A typical AWS CloudWatch-based logging system for NestJS might look like this:
- NestJS Logger / Winston / Pino: Use a logging library that supports JSON output and multiple log levels.
- Log Every Request: An interceptor that captures request/response data, timestamps, latencies, and correlation IDs.
- Log Application Events: For example, custom events in your services or domain logic, with the relevant metadata.
- Handle Exceptions Globally: A NestJS
ExceptionFilter
that logs error details and stack traces in a structured format. - Send Logs to CloudWatch: Configure your container/EC2/Elastic Beanstalk/ECS tasks to forward logs automatically to CloudWatch Logs.
- Set Up Metric Filters: Convert important log patterns (like
ERROR
or specific event types) into CloudWatch metrics. - Create Alarms: Trigger notifications if error rates spike or if average response times exceed a certain threshold.
By carefully planning out the above, you’ll have a robust, clear, and secure logging strategy that makes debugging, performance tuning, and auditing much simpler.