junhong_cmp_fiber/specs/001-fiber-middleware-integration/research.md

# Research: Fiber Middleware Integration

**Feature**: 001-fiber-middleware-integration
**Date**: 2025-11-10
**Phase**: 0 - Research & Discovery

## Overview

This document resolves technical unknowns and establishes best practices for integrating Fiber middleware with Viper configuration, Zap logging, and Redis authentication.

---

## 1. Viper Configuration Hot Reload

### Decision: Use fsnotify (Viper Native Watcher)

**Rationale**:
- Viper has built-in `WatchConfig()` method using fsnotify
- No polling overhead, event-driven file change detection
- Cross-platform support (Linux, macOS, Windows)
- Battle-tested in production environments
- Integrates seamlessly with Viper's config merge logic

**Implementation Pattern**:
```go
viper.WatchConfig()
viper.OnConfigChange(func(e fsnotify.Event) {
    log.Info("Config file changed", zap.String("file", e.Name))

    // Reload config atomically
    newConfig := &Config{}
    if err := viper.Unmarshal(newConfig); err != nil {
        log.Error("Failed to reload config", zap.Error(err))
        return // Keep existing config
    }

    // Validate new config
    if err := newConfig.Validate(); err != nil {
        log.Error("Invalid config", zap.Error(err))
        return // Keep existing config
    }

    // Atomic swap using sync/atomic
    atomic.StorePointer(&globalConfig, unsafe.Pointer(newConfig))
})
```

**Best Practices**:
- Use atomic pointer swap to avoid race conditions
- Validate configuration before applying
- Log reload events with success/failure status
- Keep existing config if new config is invalid
- Don't restart services (logger, Redis client) on every change - only update values

**Alternatives Considered**:
- Manual polling: Higher CPU overhead, added complexity
- Signal-based reload (SIGHUP): Requires manual triggering, not automatic
- Third-party config libraries (consul, etcd): Overkill for file-based config

---

## 2. Zap + Lumberjack Integration for Dual Log Files

### Decision: Two Separate Zap Logger Instances

**Rationale**:
- Clean separation of concerns (app logic vs HTTP access)
- Independent rotation policies (app.log: 100MB/30days, access.log: 500MB/90days)
- Different log levels (app: debug/info/error, access: info only)
- Easier to analyze and ship to different log aggregators
- Follows Go's simplicity principle - no complex routing logic

**Implementation Pattern**:
```go
// Application logger (app.log)
appCore := zapcore.NewCore(
    zapcore.NewJSONEncoder(encoderConfig),
    zapcore.AddSync(&lumberjack.Logger{
        Filename:   "logs/app.log",
        MaxSize:    100, // MB
        MaxBackups: 30,
        MaxAge:     30, // days
        Compress:   true,
    }),
    zap.InfoLevel,
)

// Access logger (access.log)
accessCore := zapcore.NewCore(
    zapcore.NewJSONEncoder(encoderConfig),
    zapcore.AddSync(&lumberjack.Logger{
        Filename:   "logs/access.log",
        MaxSize:    500, // MB
        MaxBackups: 90,
        MaxAge:     90, // days
        Compress:   true,
    }),
    zap.InfoLevel,
)

appLogger := zap.New(appCore, zap.AddCaller(), zap.AddStacktrace(zap.ErrorLevel))
accessLogger := zap.New(accessCore)
```

**Logger Usage**:
- **appLogger**: Business logic, errors, debug info, system events
- **accessLogger**: HTTP requests/responses only (method, path, status, duration, request ID)

**JSON Encoder Config**:
```go
encoderConfig := zapcore.EncoderConfig{
    TimeKey:        "timestamp",
    LevelKey:       "level",
    NameKey:        "logger",
    CallerKey:      "caller",
    MessageKey:     "message",
    StacktraceKey:  "stacktrace",
    LineEnding:     zapcore.DefaultLineEnding,
    EncodeLevel:    zapcore.LowercaseLevelEncoder,
    EncodeTime:     zapcore.ISO8601TimeEncoder, // RFC3339 format
    EncodeDuration: zapcore.SecondsDurationEncoder,
    EncodeCaller:   zapcore.ShortCallerEncoder,
}
```

**Alternatives Considered**:
- Single logger with routing logic: Complex, error-prone, violates separation of concerns
- Log levels for separation: Doesn't solve retention/rotation policy differences
- Multiple cores in one logger: Still requires complex routing logic

---

## 3. Fiber Middleware Execution Order

### Decision: recover → requestid → logger → keyauth → limiter → handler

**Rationale**:
1. **recover** first: Must catch panics from all downstream middleware
2. **requestid** second: All logs need request ID, including auth failures
3. **logger** third: Log all requests including auth failures
4. **keyauth** fourth: Authentication before business logic
5. **limiter** fifth: Rate limit after auth (only count authenticated requests)
6. **handler** last: Business logic with all context available

**Fiber Middleware Registration**:
```go
app.Use(customRecover())    // Must be first
app.Use(fiber.New(fiber.Config{
    Next:      nil,
    Generator: uuid.NewString, // UUID v4
}))
app.Use(customLogger(accessLogger))
app.Use(customKeyAuth(validator, appLogger))
// app.Use(customLimiter()) // Commented by default
app.Get("/api/v1/users", handler)
```

**Critical Insights**:
- Middleware executes in registration order (top to bottom)
- `recover` must be first to catch panics from all middleware
- `requestid` must be before logger to include ID in access logs
- Auth middleware should have access to request ID for security logs
- Rate limiter after auth = more accurate rate limiting per user/IP combo

**Alternatives Considered**:
- Auth before logger: Can't log auth failures with full context
- Rate limit before auth: Anonymous requests consume rate limit quota
- Request ID after logger: Access logs missing correlation IDs

---

## 4. Fiber keyauth Middleware Customization

### Decision: Wrap Fiber's keyauth with Custom Redis Validator

**Rationale**:
- Fiber's keyauth middleware provides token extraction from headers
- Custom validator function handles Redis token validation
- Clean separation: Fiber handles HTTP, validator handles business logic
- Easy to test validator independently
- Follows constitution's Handler → Service pattern

**Implementation Pattern**:
```go
// Validator service (pkg/validator/token.go)
type TokenValidator struct {
    redis  *redis.Client
    logger *zap.Logger
}

func (v *TokenValidator) Validate(token string) (string, error) {
    ctx, cancel := context.WithTimeout(context.Background(), 50*time.Millisecond)
    defer cancel()

    // Check Redis availability
    if err := v.redis.Ping(ctx).Err(); err != nil {
        return "", ErrRedisUnavailable // Fail closed
    }

    // Get user ID from token
    userID, err := v.redis.Get(ctx, constants.RedisAuthTokenKey(token)).Result()
    if err == redis.Nil {
        return "", ErrInvalidToken
    }
    if err != nil {
        return "", fmt.Errorf("redis get: %w", err)
    }

    return userID, nil
}

// Middleware wrapper (internal/middleware/auth.go)
func KeyAuth(validator *validator.TokenValidator, logger *zap.Logger) fiber.Handler {
    return keyauth.New(keyauth.Config{
        KeyLookup: "header:token",
        Validator: func(c *fiber.Ctx, key string) (bool, error) {
            userID, err := validator.Validate(key)
            if err != nil {
                logger.Warn("Token validation failed",
                    zap.String("request_id", c.Locals(constants.ContextKeyRequestID).(string)),
                    zap.Error(err),
                )
                return false, err
            }

            // Store user ID in context
            c.Locals("user_id", userID)
            return true, nil
        },
        ErrorHandler: func(c *fiber.Ctx, err error) error {
            // Map errors to unified response format
            switch err {
            case keyauth.ErrMissingOrMalformedAPIKey:
                return response.Error(c, 401, errors.CodeMissingToken, "Missing authentication token")
            case ErrInvalidToken:
                return response.Error(c, 401, errors.CodeInvalidToken, "Invalid or expired token")
            case ErrRedisUnavailable:
                return response.Error(c, 503, errors.CodeAuthServiceUnavailable, "Authentication service unavailable")
            default:
                return response.Error(c, 500, errors.CodeInternalError, "Internal server error")
            }
        },
    })
}
```

**Best Practices**:
- Use context timeout for Redis operations (50ms)
- Fail closed when Redis unavailable (HTTP 503)
- Store user ID in Fiber context (`c.Locals`) for downstream handlers
- Log all auth failures with request ID for security auditing
- Use custom error types for different failure modes

**Alternatives Considered**:
- Direct Redis calls in middleware: Violates separation of concerns
- JWT tokens: Spec requires Redis validation, not stateless tokens
- Cache validation results: Security risk, defeats Redis TTL purpose

---

## 5. Redis Client Selection

### Decision: go-redis/redis/v8

**Rationale**:
- Most widely adopted Redis client in Go ecosystem (19k+ stars)
- Excellent performance and connection pooling
- Native context support for timeouts and cancellation
- Supports Redis Cluster, Sentinel, and standalone
- Active maintenance and community support
- Already compatible with Go 1.18+ (uses generics)
- Comprehensive documentation and examples

**Connection Pool Configuration**:
```go
rdb := redis.NewClient(&redis.Options{
    Addr:         "localhost:6379",
    Password:     "",           // From config
    DB:           0,            // From config
    PoolSize:     10,           // Concurrent connections
    MinIdleConns: 5,            // Keep-alive connections
    MaxRetries:   3,            // Retry failed commands
    DialTimeout:  5 * time.Second,
    ReadTimeout:  3 * time.Second,
    WriteTimeout: 3 * time.Second,
    PoolTimeout:  4 * time.Second, // Wait for connection from pool
})
```

**Best Practices**:
- Use context with timeout for all Redis operations
- Check Redis availability with `Ping()` before critical operations
- Use `Get()` for simple token validation (O(1) complexity)
- Let Redis TTL handle token expiration (no manual cleanup)
- Monitor connection pool metrics in production

**Alternatives Considered**:
- **redigo**: Older, no context support, more manual connection management
- **rueidis**: Very fast but newer, less community adoption
- **Native Redis module**: Doesn't exist in Go standard library

---

## 6. UUID v4 Generation

### Decision: google/uuid (Already in go.mod)

**Rationale**:
- Already a dependency (via Fiber's uuid import)
- Official Google implementation, well-tested
- Simple API: `uuid.New()` or `uuid.NewString()`
- RFC 4122 compliant UUID v4 (random)
- No external dependencies
- Excellent performance (~1.5M UUIDs/sec)

**Implementation**:
```go
import "github.com/google/uuid"

// In requestid middleware config
fiber.New(fiber.Config{
    Generator: uuid.NewString, // Returns string directly
})

// Or manual generation
requestID := uuid.NewString() // "550e8400-e29b-41d4-a716-446655440000"
```

**UUID v4 Characteristics**:
- 122 random bits (collision probability ~1 in 2^122)
- No need for special collision handling
- Compatible with distributed tracing (Jaeger, OpenTelemetry)
- Human-readable in logs and headers

**Alternatives Considered**:
- **crypto/rand + manual formatting**: Reinventing the wheel, error-prone
- **ULID**: Lexicographically sortable but not requested in spec
- **Standard library**: No UUID support in Go stdlib

---

## 7. Fiber Limiter Middleware

### Decision: Fiber Built-in Limiter with Memory Storage (Commented by Default)

**Rationale**:
- Fiber's limiter middleware supports multiple storage backends
- Memory storage sufficient for single-server deployment
- Redis storage available for multi-server deployment
- Per-IP rate limiting via client IP extraction
- Sliding window or fixed window algorithms available

**Implementation Pattern (Commented)**:
```go
// Rate limiter configuration (commented by default)
// Uncomment and configure per endpoint as needed
/*
app.Use("/api/v1/", limiter.New(limiter.Config{
    Max:        100,                     // Max requests
    Expiration: 1 * time.Minute,        // Time window
    KeyGenerator: func(c *fiber.Ctx) string {
        return c.IP() // Rate limit by IP
    },
    LimitReached: func(c *fiber.Ctx) error {
        return response.Error(c, 429, errors.CodeTooManyRequests, "Too many requests")
    },
    Storage: nil, // nil = in-memory, or redis storage for distributed
}))
*/
```

**Configuration Options**:
- **Max**: Number of requests allowed in time window (e.g., 100)
- **Expiration**: Time window duration (e.g., 1 minute)
- **KeyGenerator**: Function to extract rate limit key (IP, user ID, API key)
- **Storage**: Memory (default) or Redis for distributed rate limiting
- **LimitReached**: Custom error handler returning unified response format

**Enabling Rate Limiter**:
1. Uncomment middleware registration in `main.go`
2. Configure limits per endpoint or globally
3. Choose storage backend (memory for single server, Redis for cluster)
4. Update documentation with rate limit values
5. Monitor rate limit hits in logs

**Best Practices**:
- Apply rate limits per endpoint (different limits for read vs write)
- Use Redis storage for multi-server deployments
- Log rate limit violations for abuse detection
- Return `Retry-After` header in 429 responses
- Configure different limits for authenticated vs anonymous requests

**Alternatives Considered**:
- **Third-party rate limiter**: Added complexity, Fiber's built-in sufficient
- **Token bucket algorithm**: Fiber supports sliding window, simpler to configure
- **Rate limit before auth**: Spec requires after auth, per-IP basis

---

## 8. Graceful Shutdown Pattern

### Decision: Context-Based Cancellation with Shutdown Hook

**Rationale**:
- Go's context package provides clean cancellation propagation
- Fiber supports graceful shutdown with timeout
- Config watcher must stop before application exits
- Prevents goroutine leaks and incomplete operations

**Implementation Pattern**:
```go
func main() {
    // Create root context with cancellation
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    // Initialize components with context
    cfg := config.Load()
    go config.Watch(ctx, cfg) // Pass context to watcher

    app := setupApp(cfg)

    // Graceful shutdown signal handling
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, os.Interrupt, syscall.SIGTERM)

    go func() {
        if err := app.Listen(cfg.Server.Address); err != nil {
            log.Fatal("Server failed", zap.Error(err))
        }
    }()

    <-quit // Block until signal
    log.Info("Shutting down server...")

    cancel() // Cancel context (stops config watcher)

    if err := app.ShutdownWithTimeout(30 * time.Second); err != nil {
        log.Error("Forced shutdown", zap.Error(err))
    }

    log.Info("Server stopped")
}
```

**Watcher Cancellation**:
```go
func Watch(ctx context.Context, cfg *Config) {
    viper.WatchConfig()
    viper.OnConfigChange(func(e fsnotify.Event) {
        select {
        case <-ctx.Done():
            return // Stop processing config changes
        default:
            // Reload config logic
        }
    })

    <-ctx.Done() // Block until cancelled
    log.Info("Config watcher stopped")
}
```

**Best Practices**:
- Use `context.Context` for all long-running goroutines
- Set reasonable shutdown timeout (30 seconds)
- Close resources in defer statements
- Log shutdown progress
- Flush logs before exit (`logger.Sync()`)

---

## 9. Testing Strategies

### Decision: Table-Driven Tests with Mock Redis

**Rationale**:
- Table-driven tests are Go idiomatic (endorsed by Go team)
- Mock Redis avoids external dependencies in unit tests
- Integration tests use testcontainers for real Redis
- Middleware testing requires Fiber test context

**Unit Test Pattern (Token Validator)**:
```go
func TestTokenValidator_Validate(t *testing.T) {
    tests := []struct {
        name      string
        token     string
        setupMock func(*mock.Redis)
        wantUser  string
        wantErr   error
    }{
        {
            name:  "valid token",
            token: "valid-token-123",
            setupMock: func(m *mock.Redis) {
                m.On("Get", mock.Anything, "auth:token:valid-token-123").
                    Return(redis.NewStringResult("user-456", nil))
            },
            wantUser: "user-456",
            wantErr:  nil,
        },
        {
            name:  "expired token",
            token: "expired-token",
            setupMock: func(m *mock.Redis) {
                m.On("Get", mock.Anything, "auth:token:expired-token").
                    Return(redis.NewStringResult("", redis.Nil))
            },
            wantUser: "",
            wantErr:  ErrInvalidToken,
        },
        // More cases...
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            mockRedis := &mock.Redis{}
            tt.setupMock(mockRedis)

            validator := NewTokenValidator(mockRedis, zap.NewNop())
            userID, err := validator.Validate(tt.token)

            if err != tt.wantErr {
                t.Errorf("got error %v, want %v", err, tt.wantErr)
            }
            if userID != tt.wantUser {
                t.Errorf("got userID %s, want %s", userID, tt.wantUser)
            }

            mockRedis.AssertExpectations(t)
        })
    }
}
```

**Integration Test Pattern (Middleware Chain)**:
```go
func TestMiddlewareChain(t *testing.T) {
    // Start testcontainer Redis
    redisContainer, err := testcontainers.GenericContainer(ctx,
        testcontainers.GenericContainerRequest{
            ContainerRequest: testcontainers.ContainerRequest{
                Image: "redis:7-alpine",
                ExposedPorts: []string{"6379/tcp"},
            },
            Started: true,
        })
    require.NoError(t, err)
    defer redisContainer.Terminate(ctx)

    // Setup app with middleware
    app := setupTestApp(redisContainer)

    // Test cases
    tests := []struct {
        name           string
        setupToken     func(redis *redis.Client)
        headers        map[string]string
        expectedStatus int
        expectedCode   int
    }{
        {
            name: "valid request with token",
            setupToken: func(rdb *redis.Client) {
                rdb.Set(ctx, "auth:token:valid-token", "user-123", 1*time.Hour)
            },
            headers:        map[string]string{"token": "valid-token"},
            expectedStatus: 200,
            expectedCode:   0,
        },
        // More cases...
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            if tt.setupToken != nil {
                tt.setupToken(redisClient)
            }

            req := httptest.NewRequest("GET", "/api/v1/test", nil)
            for k, v := range tt.headers {
                req.Header.Set(k, v)
            }

            resp, err := app.Test(req)
            require.NoError(t, err)
            assert.Equal(t, tt.expectedStatus, resp.StatusCode)

            // Parse response body and check code
            var body response.Response
            json.NewDecoder(resp.Body).Decode(&body)
            assert.Equal(t, tt.expectedCode, body.Code)
        })
    }
}
```

**Testing Best Practices**:
- Use `testing` package (no third-party test frameworks)
- Mock external dependencies (Redis) in unit tests
- Use real services in integration tests (testcontainers)
- Test helpers marked with `t.Helper()`
- Parallel tests when possible (`t.Parallel()`)
- Clear test names describing scenario
- Assert expected errors, not just success cases

---

## 10. Middleware Error Handling

### Decision: Custom ErrorHandler for Unified Response Format

**Rationale**:
- Fiber middleware returns errors, not HTTP responses
- ErrorHandler translates errors to unified response format
- Consistent error structure across all middleware
- Proper HTTP status codes and error codes

**Pattern**:
```go
// In each middleware config
ErrorHandler: func(c *fiber.Ctx, err error) error {
    // Map error to response
    code, status, msg := mapError(err)
    return response.Error(c, status, code, msg)
}

// Centralized error mapping
func mapError(err error) (code, status int, msg string) {
    switch {
    case errors.Is(err, ErrMissingToken):
        return errors.CodeMissingToken, 401, "Missing authentication token"
    case errors.Is(err, ErrInvalidToken):
        return errors.CodeInvalidToken, 401, "Invalid or expired token"
    case errors.Is(err, ErrRedisUnavailable):
        return errors.CodeAuthServiceUnavailable, 503, "Authentication service unavailable"
    case errors.Is(err, ErrTooManyRequests):
        return errors.CodeTooManyRequests, 429, "Too many requests"
    default:
        return errors.CodeInternalError, 500, "Internal server error"
    }
}
```

---

## Summary of Decisions

| Component | Decision | Key Rationale |
|-----------|----------|---------------|
| Config Hot Reload | Viper + fsnotify | Native support, event-driven, atomic swap |
| Logging | Dual Zap loggers + Lumberjack | Separate concerns, independent policies |
| Middleware Order | recover → requestid → logger → keyauth → limiter | Panic safety, context propagation |
| Auth Validation | Custom validator + Fiber keyauth | Separation of concerns, testability |
| Redis Client | go-redis/redis/v8 | Industry standard, excellent performance |
| Request ID | google/uuid v4 | Already in deps, RFC 4122 compliant |
| Rate Limiting | Fiber limiter (commented) | Built-in, flexible, easy to enable |
| Graceful Shutdown | Context cancellation + signal handling | Clean resource cleanup |
| Testing | Table-driven + mocks/testcontainers | Go idiomatic, balanced approach |

---

## Implementation Readiness Checklist

- [x] All technical unknowns resolved
- [x] Best practices established for each component
- [x] Go idiomatic patterns confirmed (no Java-style anti-patterns)
- [x] Constitution compliance verified (Fiber, Zap, Viper, Redis)
- [x] Testing strategies defined
- [x] Error handling patterns established
- [x] Performance considerations addressed
- [x] Security patterns confirmed (fail-closed auth)

**Status**: Ready for Phase 1 (Design & Contracts)

---

**Next**: Generate data-model.md, contracts/api.yaml, quickstart.md