# Feature Specification: Fiber Middleware Integration with Configuration Management

**Feature Branch**: `001-fiber-middleware-integration`  
**Created**: 2025-11-10  
**Status**: Draft  
**Input**: User description: "我需要你把以下东西集成到fiber中以及我们系统中,不需要你去go get,我自己会去go mod tidy

关于fiber的各种使用方法可以访问 https://docs.gofiber.io/ 来获取

同时我还需要你建立一个统一的返回结构
结构
{
  \"code\": 0000,
  \"data\": {}/[],
  \"msg\": \"\"
}
viper配置(需要支持热加载)
zap以及Lumberjack.v2
github.com/gofiber/fiber/v2/middleware/logger中间件
github.com/gofiber/fiber/v2/middleware/recover中间件
github.com/gofiber/fiber/v2/middleware/requestid中间件
github.com/gofiber/fiber/v2/middleware/keyauth中间件(应该是去redis去验证是否存在token,应该是从header头拿名为 token的字段来做比对)
github.com/gofiber/fiber/v2/middleware/limiter中间件(可以先做进来,做完后注释掉全部的代码,然后说明怎么用怎么改)"

## Clarifications

### Session 2025-11-10

- Q: What specific types of logs should the Zap + Lumberjack integration handle? → A: Both application logs and HTTP access logs, with configurable separation into different files (app.log, access.log) to enable independent retention policies and analysis workflows.
- Q: When Redis is unavailable during token validation (FR-016), what should the authentication behavior be? → A: Fail closed: All authentication requests fail immediately when Redis is unavailable (return HTTP 503)
- Q: What data structure and content should be stored in Redis for authentication tokens? → A: Token as key only (simple existence check): Store tokens as Redis keys with user ID as value, using Redis TTL for expiration
- Q: What identifier should the rate limiter use to track and enforce request limits? → A: Per-IP address: Rate limit based on client IP address with configurable requests per time window (e.g., 100 req/min per IP)
- Q: What format should be used for generating unique request IDs in the requestid middleware? → A: UUID v4 (random): Standard UUID format for maximum compatibility with distributed tracing systems and log aggregation tools

## User Scenarios & Testing

### User Story 1 - Configuration Hot Reload (Priority: P1)

When system administrators or DevOps engineers modify application configuration files (such as server ports, database connections, log levels), the system should automatically detect and apply these changes without requiring a service restart, ensuring zero-downtime configuration updates.

**Why this priority**: Configuration management is foundational for all other features. Without proper configuration loading and hot reload capability, the system cannot support runtime adjustments, which is critical for production environments.

**Independent Test**: Can be fully tested by modifying a configuration value in the config file and verifying the system picks up the new value within seconds without restart, delivering immediate configuration flexibility.

**Acceptance Scenarios**:

1. **Given** the system is running with initial configuration, **When** an administrator updates the log level in the config file, **Then** the system detects the change within 5 seconds and applies the new log level to all subsequent log entries
2. **Given** the system is running, **When** configuration file contains invalid syntax, **Then** the system logs a warning and continues using the previous valid configuration
3. **Given** configuration hot reload is enabled, **When** multiple configuration parameters are changed simultaneously, **Then** all changes are applied atomically without partial updates

---

### User Story 2 - Structured Logging and Log Rotation (Priority: P1)

When the system processes requests and business operations, all events, errors, and debugging information should be recorded in structured JSON format with automatic log file rotation based on size and time, ensuring comprehensive audit trails without disk space exhaustion. The system maintains separate log files for application logs (app.log) and HTTP access logs (access.log) with independent retention policies.

**Why this priority**: Logging is essential for debugging, monitoring, and compliance. Structured logs enable efficient querying and analysis, while automatic rotation prevents operational issues. Separating application and access logs allows for different retention policies and analysis workflows.

**Independent Test**: Can be fully tested by generating various log events and verifying they appear in structured JSON format in the appropriate files, and that log files rotate when size/time thresholds are reached, delivering production-ready logging capability.

**Acceptance Scenarios**:

1. **Given** the system is processing requests, **When** any application operation occurs, **Then** logs are written to app.log in JSON format containing timestamp, level, message, request ID, and contextual data
2. **Given** the system is processing HTTP requests, **When** requests complete, **Then** access logs are written to access.log with request method, path, status, duration, and request ID
3. **Given** a log file reaches the configured size limit, **When** new log entries are generated, **Then** the current log file is archived and a new log file is created
4. **Given** log retention is configured for 30 days for application logs and 90 days for access logs, **When** log files exceed the retention period, **Then** older log files are automatically removed according to their respective policies
5. **Given** multiple log levels are configured (debug, info, warn, error), **When** logging at different levels, **Then** only messages at or above the configured level are written

---

### User Story 3 - Unified API Response Format (Priority: P1)

When API consumers (frontend applications, mobile apps, third-party integrations) make requests to any endpoint, they should receive responses in a consistent JSON structure containing status code, data payload, and message, regardless of success or failure, enabling predictable error handling and data parsing.

**Why this priority**: Consistent response format is critical for API consumers to reliably parse responses. Without this, every endpoint integration becomes custom work, increasing development time and bug potential.

**Independent Test**: Can be fully tested by calling any endpoint (successful or failed) and verifying the response structure matches the defined format with appropriate code, data, and message fields, delivering immediate API consistency.

**Acceptance Scenarios**:

1. **Given** a valid API request, **When** the request succeeds, **Then** the response follows the unified format defined in FR-007 with code 0 and appropriate data
2. **Given** an invalid API request, **When** validation fails, **Then** the response follows the unified format defined in FR-007 with appropriate error code and null data
3. **Given** any API endpoint, **When** processing completes, **Then** the response structure always includes all three required fields (code, data, msg) as specified in FR-007
4. **Given** list/array data is returned, **When** the response is generated, **Then** the data field contains an array instead of an object, maintaining the unified format structure

---

### User Story 4 - Request Logging and Tracing (Priority: P2)

When HTTP requests arrive at the system, each request should be assigned a unique identifier and all request details (method, path, duration, status) should be logged, enabling request tracking across distributed components and performance analysis.

**Why this priority**: Request logging provides visibility into system usage patterns and performance. The unique request ID enables correlation of logs across services for troubleshooting.

**Independent Test**: Can be fully tested by making multiple concurrent requests and verifying each has a unique request ID in logs and response headers, and that request metrics are captured, delivering complete request observability.

**Acceptance Scenarios**:

1. **Given** an HTTP request arrives, **When** it enters the system, **Then** a unique request ID in UUID v4 format (e.g., "550e8400-e29b-41d4-a716-446655440000") is generated and added to the request context
2. **Given** a request is being processed, **When** any logging occurs during that request, **Then** the request ID is automatically included in log entries
3. **Given** a request completes, **When** the response is sent, **Then** the request ID is included in response headers (X-Request-ID) and a summary log entry records method, path, status, and duration
4. **Given** multiple concurrent requests, **When** processed simultaneously, **Then** each request maintains its own unique UUID v4 request ID without collision

---

### User Story 5 - Automatic Error Recovery (Priority: P2)

When unexpected errors or panics occur during request processing, the system should automatically recover from the failure, log detailed error information, return an appropriate error response to the client, and continue serving subsequent requests without crashing.

**Why this priority**: Error recovery prevents cascading failures and ensures service availability. A single panic should not bring down the entire application.

**Independent Test**: Can be fully tested by triggering a controlled panic in a handler and verifying the system returns an error response, logs the panic details, and continues processing subsequent requests normally, delivering fault tolerance.

**Acceptance Scenarios**:

1. **Given** a request handler panics, **When** the panic occurs, **Then** the middleware recovers, logs the panic stack trace, and returns HTTP 500 with error details
2. **Given** a panic is recovered, **When** subsequent requests arrive, **Then** they are processed normally without any impact from the previous panic
3. **Given** a panic includes error details, **When** logged, **Then** the log entry contains the panic message, stack trace, request ID, and request details

---

### User Story 6 - Token-Based Authentication (Priority: P2)

When external clients make API requests, they must provide a valid authentication token in the request header, which the system validates against stored tokens in Redis cache, ensuring only authorized requests can access protected resources.

**Why this priority**: Authentication is essential for security but depends on the foundational components (config, logging, response format) being in place first.

**Independent Test**: Can be fully tested by making requests with valid/invalid/missing tokens and verifying that valid tokens grant access while invalid ones are rejected with appropriate error codes, delivering access control capability.

**Acceptance Scenarios**:

1. **Given** a request to a protected endpoint, **When** the "token" header is missing, **Then** the system returns HTTP 401 with `{"code": 1001, "data": null, "msg": "缺失认证令牌"}`
2. **Given** a request with a token, **When** the token exists as a key in Redis, **Then** the system retrieves the user ID from the value and allows the request to proceed with user context
3. **Given** a request with a token, **When** the token does not exist in Redis (either never created or TTL expired), **Then** the system returns HTTP 401 with `{"code": 1002, "data": null, "msg": "令牌无效或已过期"}`
4. **Given** Redis is unavailable, **When** token validation is attempted, **Then** the system immediately fails closed, logs the Redis connection error, and returns HTTP 503 with `{"code": 1004, "data": null, "msg": "认证服务不可用"}` without attempting fallback mechanisms

---

### User Story 7 - Rate Limiting Configuration (Priority: P3)

The system should provide configurable IP-based rate limiting capabilities that can restrict the number of requests from a specific client IP address within a time window, with the functionality initially implemented but disabled by default, allowing future activation based on specific endpoint requirements.

**Why this priority**: Rate limiting is important for production but not critical for initial deployment. It can be activated later when traffic patterns are better understood.

**Independent Test**: Can be fully tested by enabling the limiter configuration, making repeated requests from the same IP exceeding the limit, and verifying that excess requests are rejected with rate limit error messages, delivering DoS protection capability when needed.

**Acceptance Scenarios**:

1. **Given** rate limiting is configured and enabled for an endpoint with 100 requests per minute per IP, **When** a client IP exceeds the request limit within the time window, **Then** subsequent requests from that IP return HTTP 429 with `{"code": 1003, "data": null, "msg": "请求过于频繁"}`
2. **Given** the rate limit time window expires, **When** new requests arrive from the same client IP, **Then** the request counter resets and requests are allowed again
3. **Given** rate limiting is disabled (default), **When** any number of requests arrive, **Then** all requests are processed without rate limit checks
4. **Given** rate limiting is enabled, **When** requests arrive from different IP addresses, **Then** each IP address has its own independent request counter and limit

---

### Edge Cases

- What happens when the configuration file is deleted while the system is running? (System should log error and continue with current configuration)
- What happens when Redis connection is lost during token validation? (System immediately fails closed, returns HTTP 503 with code 1004, logs connection failure, and does not attempt any fallback authentication)
- What happens when log directory is not writable? 
  - **At startup**: System MUST fail immediately with exit code 1 and clear error message to stderr before listening on any port (e.g., "Fatal: Cannot write to log directory 'logs/': permission denied")
  - **At runtime**: If log directory becomes non-writable after successful startup, system MUST log error to stderr, continue serving requests but return HTTP 503 on health check endpoint until log directory becomes writable again
- What happens when a request ID collision occurs? (With UUID v4, collision probability is negligible: ~1 in 2^122; no special handling needed)
- What happens when configuration hot reload occurs during active request processing? (Configuration changes should not affect in-flight requests)
- What happens when log rotation occurs while writing a log entry? (Log rotation should be atomic and not lose log entries)
- What happens when invalid configuration values are provided (e.g., negative numbers for limits)? (System should validate config on load and reject invalid values with clear error messages)

## Requirements

### Functional Requirements

- **FR-001**: System MUST load configuration from files using Viper configuration library
- **FR-002**: System MUST support hot reload of configuration files using fsnotify-based file system event detection (immediate notification on file changes), with configuration changes applied within 5 seconds of file modification and without service restart. The 5-second window includes file event detection, validation, and atomic configuration swap.
- **FR-003**: System MUST validate configuration values on load and reject invalid configurations with descriptive error messages following the format: `"Invalid configuration: {field_path}: {error_reason} (current value: {value}, expected: {constraint})"`. Validation categories include:
  - **Type validation**: All fields match expected types (string, int, bool, duration)
  - **Range validation**: Numeric values within acceptable ranges (e.g., server.port: 1024-65535, log.max_size: 1-1000 MB)
  - **Required fields**: server.host, server.port, redis.addr, logging.app_log_path, logging.access_log_path
  - **Format validation**: Durations use Go duration format (e.g., "5m", "30s"), file paths are absolute or relative valid paths
  - **Example error**: `"Invalid configuration: server.port: port number out of range (current value: 80, expected: 1024-65535)"`
  - **Complete validation rules**: See data-model.md "Configuration Validation Rules" section for comprehensive field-by-field validation constraints
- **FR-004**: System MUST use Zap structured logging for all application logs with log rotation via Lumberjack.v2 and configurable log levels. The system maintains two independent Zap logger instances:
  - **appLogger**: For application-level logs (business logic, errors, middleware events, debug info)
  - **accessLogger**: For HTTP access logs (request/response details per FR-011)
  - Each logger instance has separate Lumberjack rotation configuration for independent file management
- **FR-004a**: System MUST separate application logs (app.log) and HTTP access logs (access.log) into different files with independent configuration
- **FR-005**: System MUST rotate log files automatically using Lumberjack.v2 based on configurable size and age parameters for both application and access logs
- **FR-006**: System MUST retain log files according to configured retention policy and automatically remove expired logs, with separate retention settings for application and access logs. Retention policy is specified in days (integer) and configured via config file (e.g., `logging.app_log_max_age: 30` for 30-day retention of app.log, `logging.access_log_max_age: 90` for 90-day retention of access.log). Implemented via Lumberjack MaxAge parameter.
- **FR-007**: All API responses MUST follow the unified format: `{"code": [number], "data": [object/array/null], "msg": [string]}`. Examples:
  - **Success response**: `{"code": 0, "data": {...}, "msg": "success"}`
  - **Error response**: `{"code": [error_code], "data": null, "msg": "[error description]"}`
  - **List response**: `{"code": 0, "data": [...], "msg": "success"}`
  - The response structure always includes all three fields (code, data, msg) regardless of success or failure
- **FR-008**: System MUST assign a unique request ID to every incoming HTTP request using requestid middleware
- **FR-008a**: Request IDs MUST be generated using UUID v4 format for maximum compatibility with distributed tracing systems and log aggregation tools
- **FR-009**: System MUST include the request ID in all log entries associated with that request
- **FR-010**: System MUST include the request ID in HTTP response headers for client-side tracing
- **FR-011**: System MUST log all HTTP requests with method, path, status code, duration, and request ID using logger middleware. Access logs written to access.log MUST use structured JSON format with fields: timestamp (ISO 8601), level, request_id, method, path, status, duration_ms, ip, user_agent, and user_id (if authenticated). See data-model.md "Access Log Entry Format" for complete schema definition.
- **FR-012**: System MUST automatically recover from panics during request processing using recover middleware
- **FR-013**: When a panic is recovered, system MUST log the full stack trace and error details
- **FR-014**: When a panic is recovered, system MUST return HTTP 500 with unified error response format. Response format: `{"code": 1000, "data": null, "msg": "服务器内部错误"}`. The panic error message detail level MUST be configurable via code constant (not config file) to support different deployment environments:
  - **Detailed mode** (default for development): Include sanitized panic message in response.msg (e.g., `"服务器内部错误: runtime error: invalid memory address"`)
  - **Simple mode** (for production): Return generic message only (`"服务器内部错误"`)
  - **Configuration**: Define constant in `pkg/constants/constants.go` as `const PanicResponseDetailLevel = "detailed"` or `"simple"`, easily changeable by developers before deployment
  - **Security**: Full stack trace ALWAYS logged to app.log only, NEVER included in HTTP response regardless of mode
  - All response messages MUST use Chinese, not English
- **FR-015**: System MUST validate authentication tokens from the "token" request header using keyauth middleware
- **FR-016**: System MUST check token validity by verifying existence in Redis cache using token string as key
- **FR-016a**: System MUST store tokens in Redis as simple key-value pairs with token as key and user ID as value, using Redis TTL for expiration management
- **FR-016b**: When Redis is unavailable during token validation, system MUST fail closed and return HTTP 503 immediately without fallback or caching mechanisms
- **FR-017**: System MUST return HTTP 401 with appropriate error code and message when token is missing or invalid
- **FR-018**: System MUST provide configurable IP-based rate limiting capability using limiter middleware
- **FR-018a**: Rate limiting MUST track request counts per client IP address with configurable limits (requests per time window). Default configuration: 30 requests per minute per IP. Supported time units: second (s), minute (m), hour (h). Configuration example in config file: `limiter.max: 30, limiter.window: 1m`
- **FR-018b**: When rate limit is exceeded, system MUST return HTTP 429 with code 1003 and appropriate error message
- **FR-019**: Rate limiting implementation MUST be provided but disabled by default in initial deployment
- **FR-020**: System MUST include documentation on how to configure and enable rate limiting per endpoint with example configurations. Documentation MUST be created as a separate file `docs/rate-limiting.md` containing:
  - **Configuration parameters**: Detailed explanation of `max`, `expiration`, and `storage` settings
  - **Per-endpoint setup**: How to enable/disable rate limiting for specific routes or globally
  - **Code examples**: Complete examples showing how to uncomment and configure the limiter middleware in `cmd/api/main.go`
  - **Testing guide**: Step-by-step instructions with curl commands to test rate limiting behavior
  - **Storage options**: Comparison of memory vs Redis storage backends with use cases
  - **Common patterns**: Examples for different scenarios (public API, admin endpoints, webhook receivers)
- **FR-021**: System MUST use consistent error codes across all error scenarios with bilingual (Chinese/English) support
- **FR-022**: Configuration MUST support different environments (development, staging, production) with separate config files

### Technical Requirements (Constitution-Driven)

**Tech Stack Compliance**:
- [x] All HTTP operations use Fiber framework (no `net/http` shortcuts)
- [x] All async tasks use Asynq (if applicable)
- [x] All logging uses Zap + Lumberjack.v2
- [x] All configuration uses Viper

**Architecture Requirements**:
- [x] Implementation follows Handler → Service → Store → Model layers (applies to auth token validation)
- [x] Dependencies injected via Service/Store structs
- [x] Unified error codes defined in `pkg/errors/`
- [x] Unified API responses via `pkg/response/`
- [x] All constants defined in `pkg/constants/` (no magic numbers/strings)
- [x] All Redis keys managed via `pkg/constants/` key generation functions

**API Design Requirements**:
- [x] All APIs follow RESTful principles
- [x] All responses use unified JSON format with code/message/data/timestamp
- [x] All error messages include error codes and bilingual descriptions
- [x] All time fields use ISO 8601 format (RFC3339)

**Performance Requirements**:
- [x] API response time (P95) < 200ms
- [x] Database queries < 50ms (if applicable)
- [x] Non-realtime operations delegated to async tasks (if applicable)

**Testing Requirements**:
- [x] Unit tests for all Service layer business logic
- [x] Integration tests for all API endpoints
- [x] Tests are independent and use mocks/testcontainers
- [x] Target coverage: 70%+ overall, 90%+ for core business logic

### Key Entities

- **Configuration**: Represents application configuration settings including server parameters, database connections, Redis settings, logging configuration (with separate settings for app.log and access.log including independent rotation and retention policies), and middleware settings. Supports hot reload capability to apply changes without restart.
  
- **AuthToken**: Represents an authentication token stored in Redis cache as a simple key-value pair. The token string is used as the Redis key, and the user ID is stored as the value. Token expiration is managed via Redis TTL mechanism. This structure enables O(1) existence checks for authentication validation.

- **Request Context**: Represents the execution context of an HTTP request, containing unique request ID (UUID v4 format), authentication information (user ID from token validation), request start time, and other metadata used for logging and tracing.

- **Log Entry**: Represents a structured log record containing timestamp, severity level, message, request ID, user context, and additional contextual fields, written in JSON format.

- **Rate Limit State**: Represents the current request count and time window for a specific client IP address, used to enforce per-IP rate limiting policies. Tracks remaining quota and window reset time for each unique IP.

## Success Criteria

### Measurable Outcomes

- **SC-001**: System administrators can modify any configuration value in the config file and see it applied within 5 seconds (file event detection + validation + atomic swap) without service restart, verified by observing the configuration change take effect (e.g., log level change reflected in subsequent log entries)
- **SC-002**: All API responses follow the unified `{code, data, msg}` structure with 100% consistency across all endpoints
- **SC-003**: Every HTTP request generates a unique UUID v4 request ID that appears in the X-Request-ID response header and all associated log entries
- **SC-004**: System continues processing new requests within 100ms after recovering from a panic, with zero downtime
- **SC-005**: Log files automatically rotate when reaching configured size limits (e.g., 100MB) without manual intervention
- **SC-006**: Invalid authentication tokens are rejected within 50ms with clear error messages, preventing unauthorized access
- **SC-007**: All logs are written in valid JSON format that can be parsed by standard log aggregation tools without errors
- **SC-008**: 100% of HTTP requests are logged with method, path, status, duration, and request ID for complete audit trail
- **SC-009**: Rate limiting (when enabled) successfully blocks requests exceeding configured limits within the time window with appropriate error responses
- **SC-010**: System successfully loads configuration from different environment-specific files (dev, staging, prod) based on environment variable