做完了一部分,备份一下,防止以外删除

2025-11-11 15:16:38 +08:00
parent 9600e5b6e0
commit e98dd4d725
39 changed files with 2423 additions and 183 deletions
--- a/specs/001-fiber-middleware-integration/spec.md
+++ b/specs/001-fiber-middleware-integration/spec.md
@@ -78,10 +78,10 @@ When API consumers (frontend applications, mobile apps, third-party integrations

 **Acceptance Scenarios**:

-1. **Given** a valid API request, **When** the request succeeds, **Then** the response contains `{"code": 0, "data": {...}, "msg": "success"}`
-2. **Given** an invalid API request, **When** validation fails, **Then** the response contains `{"code": [error_code], "data": null, "msg": "[error description]"}`
-3. **Given** any API endpoint, **When** processing completes, **Then** the response structure always includes code, data, and msg fields
-4. **Given** list/array data is returned, **When** the response is generated, **Then** the data field contains an array instead of an object
+1. **Given** a valid API request, **When** the request succeeds, **Then** the response follows the unified format defined in FR-007 with code 0 and appropriate data
+2. **Given** an invalid API request, **When** validation fails, **Then** the response follows the unified format defined in FR-007 with appropriate error code and null data
+3. **Given** any API endpoint, **When** processing completes, **Then** the response structure always includes all three required fields (code, data, msg) as specified in FR-007
+4. **Given** list/array data is returned, **When** the response is generated, **Then** the data field contains an array instead of an object, maintaining the unified format structure

 ---

@@ -128,10 +128,10 @@ When external clients make API requests, they must provide a valid authenticatio

 **Acceptance Scenarios**:

-1. **Given** a request to a protected endpoint, **When** the "token" header is missing, **Then** the system returns HTTP 401 with `{"code": 1001, "data": null, "msg": "Missing authentication token"}`
+1. **Given** a request to a protected endpoint, **When** the "token" header is missing, **Then** the system returns HTTP 401 with `{"code": 1001, "data": null, "msg": "缺失认证令牌"}`
 2. **Given** a request with a token, **When** the token exists as a key in Redis, **Then** the system retrieves the user ID from the value and allows the request to proceed with user context
-3. **Given** a request with a token, **When** the token does not exist in Redis (either never created or TTL expired), **Then** the system returns HTTP 401 with `{"code": 1002, "data": null, "msg": "Invalid or expired token"}`
-4. **Given** Redis is unavailable, **When** token validation is attempted, **Then** the system immediately fails closed, logs the Redis connection error, and returns HTTP 503 with `{"code": 1004, "data": null, "msg": "Authentication service unavailable"}` without attempting fallback mechanisms
+3. **Given** a request with a token, **When** the token does not exist in Redis (either never created or TTL expired), **Then** the system returns HTTP 401 with `{"code": 1002, "data": null, "msg": "令牌无效或已过期"}`
+4. **Given** Redis is unavailable, **When** token validation is attempted, **Then** the system immediately fails closed, logs the Redis connection error, and returns HTTP 503 with `{"code": 1004, "data": null, "msg": "认证服务不可用"}` without attempting fallback mechanisms

 ---

@@ -145,7 +145,7 @@ The system should provide configurable IP-based rate limiting capabilities that

 **Acceptance Scenarios**:

-1. **Given** rate limiting is configured and enabled for an endpoint with 100 requests per minute per IP, **When** a client IP exceeds the request limit within the time window, **Then** subsequent requests from that IP return HTTP 429 with `{"code": 1003, "data": null, "msg": "Too many requests"}`
+1. **Given** rate limiting is configured and enabled for an endpoint with 100 requests per minute per IP, **When** a client IP exceeds the request limit within the time window, **Then** subsequent requests from that IP return HTTP 429 with `{"code": 1003, "data": null, "msg": "请求过于频繁"}`
 2. **Given** the rate limit time window expires, **When** new requests arrive from the same client IP, **Then** the request counter resets and requests are allowed again
 3. **Given** rate limiting is disabled (default), **When** any number of requests arrive, **Then** all requests are processed without rate limit checks
 4. **Given** rate limiting is enabled, **When** requests arrive from different IP addresses, **Then** each IP address has its own independent request counter and limit
@@ -156,7 +156,9 @@ The system should provide configurable IP-based rate limiting capabilities that

 - What happens when the configuration file is deleted while the system is running? (System should log error and continue with current configuration)
 - What happens when Redis connection is lost during token validation? (System immediately fails closed, returns HTTP 503 with code 1004, logs connection failure, and does not attempt any fallback authentication)
- What happens when log directory is not writable? (System should fail to start with clear error message)
+- What happens when log directory is not writable? 
+  - **At startup**: System MUST fail immediately with exit code 1 and clear error message to stderr before listening on any port (e.g., "Fatal: Cannot write to log directory 'logs/': permission denied")
+  - **At runtime**: If log directory becomes non-writable after successful startup, system MUST log error to stderr, continue serving requests but return HTTP 503 on health check endpoint until log directory becomes writable again
 - What happens when a request ID collision occurs? (With UUID v4, collision probability is negligible: ~1 in 2^122; no special handling needed)
 - What happens when configuration hot reload occurs during active request processing? (Configuration changes should not affect in-flight requests)
 - What happens when log rotation occurs while writing a log entry? (Log rotation should be atomic and not lose log entries)
@@ -167,31 +169,55 @@ The system should provide configurable IP-based rate limiting capabilities that
 ### Functional Requirements

 - **FR-001**: System MUST load configuration from files using Viper configuration library
- **FR-002**: System MUST support hot reload of configuration files, detecting changes within 5 seconds and applying them without service restart
- **FR-003**: System MUST validate configuration values on load and reject invalid configurations with descriptive error messages
- **FR-004**: System MUST write all logs in structured JSON format using Zap logging library
+- **FR-002**: System MUST support hot reload of configuration files using fsnotify-based file system event detection (immediate notification on file changes), with configuration changes applied within 5 seconds of file modification and without service restart. The 5-second window includes file event detection, validation, and atomic configuration swap.
+- **FR-003**: System MUST validate configuration values on load and reject invalid configurations with descriptive error messages following the format: `"Invalid configuration: {field_path}: {error_reason} (current value: {value}, expected: {constraint})"`. Validation categories include:
+  - **Type validation**: All fields match expected types (string, int, bool, duration)
+  - **Range validation**: Numeric values within acceptable ranges (e.g., server.port: 1024-65535, log.max_size: 1-1000 MB)
+  - **Required fields**: server.host, server.port, redis.addr, logging.app_log_path, logging.access_log_path
+  - **Format validation**: Durations use Go duration format (e.g., "5m", "30s"), file paths are absolute or relative valid paths
+  - **Example error**: `"Invalid configuration: server.port: port number out of range (current value: 80, expected: 1024-65535)"`
+  - **Complete validation rules**: See data-model.md "Configuration Validation Rules" section for comprehensive field-by-field validation constraints
+- **FR-004**: System MUST use Zap structured logging for all application logs with log rotation via Lumberjack.v2 and configurable log levels. The system maintains two independent Zap logger instances:
+  - **appLogger**: For application-level logs (business logic, errors, middleware events, debug info)
+  - **accessLogger**: For HTTP access logs (request/response details per FR-011)
+  - Each logger instance has separate Lumberjack rotation configuration for independent file management
 - **FR-004a**: System MUST separate application logs (app.log) and HTTP access logs (access.log) into different files with independent configuration
 - **FR-005**: System MUST rotate log files automatically using Lumberjack.v2 based on configurable size and age parameters for both application and access logs
- **FR-006**: System MUST retain log files according to configured retention policy and automatically remove expired logs, with separate retention settings for application and access logs
- **FR-007**: All API responses MUST follow the unified format: `{"code": [number], "data": [object/array/null], "msg": [string]}`
+- **FR-006**: System MUST retain log files according to configured retention policy and automatically remove expired logs, with separate retention settings for application and access logs. Retention policy is specified in days (integer) and configured via config file (e.g., `logging.app_log_max_age: 30` for 30-day retention of app.log, `logging.access_log_max_age: 90` for 90-day retention of access.log). Implemented via Lumberjack MaxAge parameter.
+- **FR-007**: All API responses MUST follow the unified format: `{"code": [number], "data": [object/array/null], "msg": [string]}`. Examples:
+  - **Success response**: `{"code": 0, "data": {...}, "msg": "success"}`
+  - **Error response**: `{"code": [error_code], "data": null, "msg": "[error description]"}`
+  - **List response**: `{"code": 0, "data": [...], "msg": "success"}`
+  - The response structure always includes all three fields (code, data, msg) regardless of success or failure
 - **FR-008**: System MUST assign a unique request ID to every incoming HTTP request using requestid middleware
 - **FR-008a**: Request IDs MUST be generated using UUID v4 format for maximum compatibility with distributed tracing systems and log aggregation tools
 - **FR-009**: System MUST include the request ID in all log entries associated with that request
 - **FR-010**: System MUST include the request ID in HTTP response headers for client-side tracing
- **FR-011**: System MUST log all HTTP requests with method, path, status code, duration, and request ID using logger middleware
+- **FR-011**: System MUST log all HTTP requests with method, path, status code, duration, and request ID using logger middleware. Access logs written to access.log MUST use structured JSON format with fields: timestamp (ISO 8601), level, request_id, method, path, status, duration_ms, ip, user_agent, and user_id (if authenticated). See data-model.md "Access Log Entry Format" for complete schema definition.
 - **FR-012**: System MUST automatically recover from panics during request processing using recover middleware
 - **FR-013**: When a panic is recovered, system MUST log the full stack trace and error details
- **FR-014**: When a panic is recovered, system MUST return HTTP 500 with unified error response format
+- **FR-014**: When a panic is recovered, system MUST return HTTP 500 with unified error response format. Response format: `{"code": 1000, "data": null, "msg": "服务器内部错误"}`. The panic error message detail level MUST be configurable via code constant (not config file) to support different deployment environments:
+  - **Detailed mode** (default for development): Include sanitized panic message in response.msg (e.g., `"服务器内部错误: runtime error: invalid memory address"`)
+  - **Simple mode** (for production): Return generic message only (`"服务器内部错误"`)
+  - **Configuration**: Define constant in `pkg/constants/constants.go` as `const PanicResponseDetailLevel = "detailed"` or `"simple"`, easily changeable by developers before deployment
+  - **Security**: Full stack trace ALWAYS logged to app.log only, NEVER included in HTTP response regardless of mode
+  - All response messages MUST use Chinese, not English
 - **FR-015**: System MUST validate authentication tokens from the "token" request header using keyauth middleware
 - **FR-016**: System MUST check token validity by verifying existence in Redis cache using token string as key
 - **FR-016a**: System MUST store tokens in Redis as simple key-value pairs with token as key and user ID as value, using Redis TTL for expiration management
 - **FR-016b**: When Redis is unavailable during token validation, system MUST fail closed and return HTTP 503 immediately without fallback or caching mechanisms
 - **FR-017**: System MUST return HTTP 401 with appropriate error code and message when token is missing or invalid
 - **FR-018**: System MUST provide configurable IP-based rate limiting capability using limiter middleware
- **FR-018a**: Rate limiting MUST track request counts per client IP address with configurable limits (requests per time window)
+- **FR-018a**: Rate limiting MUST track request counts per client IP address with configurable limits (requests per time window). Default configuration: 30 requests per minute per IP. Supported time units: second (s), minute (m), hour (h). Configuration example in config file: `limiter.max: 30, limiter.window: 1m`
 - **FR-018b**: When rate limit is exceeded, system MUST return HTTP 429 with code 1003 and appropriate error message
 - **FR-019**: Rate limiting implementation MUST be provided but disabled by default in initial deployment
- **FR-020**: System MUST include documentation on how to configure and enable rate limiting per endpoint with example configurations
+- **FR-020**: System MUST include documentation on how to configure and enable rate limiting per endpoint with example configurations. Documentation MUST be created as a separate file `docs/rate-limiting.md` containing:
+  - **Configuration parameters**: Detailed explanation of `max`, `expiration`, and `storage` settings
+  - **Per-endpoint setup**: How to enable/disable rate limiting for specific routes or globally
+  - **Code examples**: Complete examples showing how to uncomment and configure the limiter middleware in `cmd/api/main.go`
+  - **Testing guide**: Step-by-step instructions with curl commands to test rate limiting behavior
+  - **Storage options**: Comparison of memory vs Redis storage backends with use cases
+  - **Common patterns**: Examples for different scenarios (public API, admin endpoints, webhook receivers)
 - **FR-021**: System MUST use consistent error codes across all error scenarios with bilingual (Chinese/English) support
 - **FR-022**: Configuration MUST support different environments (development, staging, production) with separate config files

@@ -244,7 +270,7 @@ The system should provide configurable IP-based rate limiting capabilities that

 ### Measurable Outcomes

- **SC-001**: System administrators can modify any configuration value and see it applied within 5 seconds without service restart
+- **SC-001**: System administrators can modify any configuration value in the config file and see it applied within 5 seconds (file event detection + validation + atomic swap) without service restart, verified by observing the configuration change take effect (e.g., log level change reflected in subsequent log entries)
 - **SC-002**: All API responses follow the unified `{code, data, msg}` structure with 100% consistency across all endpoints
 - **SC-003**: Every HTTP request generates a unique UUID v4 request ID that appears in the X-Request-ID response header and all associated log entries
 - **SC-004**: System continues processing new requests within 100ms after recovering from a panic, with zero downtime