From 286defb0635b451d89e808b5dfed9d6a1aa86042 Mon Sep 17 00:00:00 2001 From: huang Date: Tue, 20 Jan 2026 11:33:26 +0800 Subject: [PATCH] =?UTF-8?q?=E4=BF=AE=E5=A4=8D=20API=20=E5=AE=B9=E5=99=A8?= =?UTF-8?q?=E5=81=A5=E5=BA=B7=E6=A3=80=E6=9F=A5=E5=A4=B1=E8=B4=A5=EF=BC=9A?= =?UTF-8?q?=E7=BB=9F=E4=B8=80=E7=AB=AF=E5=8F=A3=E9=85=8D=E7=BD=AE=E5=B9=B6?= =?UTF-8?q?=E6=B7=BB=E5=8A=A0=E6=95=B0=E6=8D=AE=E5=BA=93=E7=8E=AF=E5=A2=83?= =?UTF-8?q?=E5=8F=98=E9=87=8F?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 问题1: 端口不一致 - Dockerfile.api 中 EXPOSE 和健康检查使用 8088 - config.yaml 中 API 实际监听 3000 - 健康检查失败导致容器 unhealthy 问题2: 缺少数据库环境变量 - entrypoint-api.sh 需要 DB_HOST、DB_USER 等环境变量执行迁移 - docker-compose.prod.yml 没有定义这些变量 - 容器启动脚本立即退出 修复: - Dockerfile.api: EXPOSE 和健康检查改为 3000 - docker-compose.prod.yml: 添加完整的数据库环境变量 --- Dockerfile.api | 4 +- docker-compose.prod.yml | 7 + docs/DEPLOYMENT_TROUBLESHOOTING.md | 449 +++++++++++++++++++++++++++++ 3 files changed, 458 insertions(+), 2 deletions(-) create mode 100644 docs/DEPLOYMENT_TROUBLESHOOTING.md diff --git a/Dockerfile.api b/Dockerfile.api index 4183e4a..814f30d 100644 --- a/Dockerfile.api +++ b/Dockerfile.api @@ -71,11 +71,11 @@ RUN chmod +x /app/entrypoint.sh USER appuser # 暴露端口 -EXPOSE 8088 +EXPOSE 3000 # 健康检查(使用 Alpine 自带的 wget) HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \ - CMD wget --no-verbose --tries=1 --spider http://localhost:8088/health || exit 1 + CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 # 启动命令 ENTRYPOINT ["/app/entrypoint.sh"] diff --git a/docker-compose.prod.yml b/docker-compose.prod.yml index de32063..bc9ff17 100644 --- a/docker-compose.prod.yml +++ b/docker-compose.prod.yml @@ -7,6 +7,13 @@ services: restart: unless-stopped ports: - "3000:3000" + environment: + - DB_HOST=cxd.whcxd.cn + - DB_PORT=16159 + - DB_USER=erp_pgsql + - DB_PASSWORD=erp_2025 + - DB_NAME=junhong_cmp_test + - DB_SSLMODE=disable volumes: - ./configs:/app/configs:ro - ./logs:/app/logs diff --git a/docs/DEPLOYMENT_TROUBLESHOOTING.md b/docs/DEPLOYMENT_TROUBLESHOOTING.md new file mode 100644 index 0000000..ebae371 --- /dev/null +++ b/docs/DEPLOYMENT_TROUBLESHOOTING.md @@ -0,0 +1,449 @@ +# 部署问题排查指南 + +## 当前状态 + +**最新修复**: Commit `bf4ef37` - 修复 docker compose 找不到配置文件:显式指定文件名 + +**修复内容**: +- 所有 `docker compose` 命令添加 `-f docker-compose.prod.yml` 参数 +- 确保在正确的工作目录执行命令 + +--- + +## 快速验证清单 + +### 1. 检查 Gitea Actions 构建状态 + +访问: https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions + +**查看最新运行**: +- 运行 ID: 查看最新的 workflow 运行 +- 状态: 应该显示 ✅ 成功(绿色) +- 时间: 预计 15-20 分钟完成 + +**关键步骤验证**: +``` +✅ 检出代码 (Checkout code) +✅ 设置镜像标签 (Set image tags) +✅ 登录 Docker Registry (Login to Docker Registry) +✅ 构建 API 镜像 (Build API image) +✅ 构建 Worker 镜像 (Build Worker image) +✅ 推送 API 镜像 (Push API image) +✅ 推送 Worker 镜像 (Push Worker image) +✅ 部署到测试服务器 (Deploy to test server) <-- 本次重点修复 +``` + +--- + +### 2. SSH 登录服务器验证 + +```bash +# 登录服务器 +ssh qycard001@47.111.166.169 -p 52022 + +# 进入部署目录 +cd /opt/junhong_cmp + +# 检查文件是否存在 +ls -la +# 预期输出: +# docker-compose.prod.yml +# configs/config.yaml +# logs/ + +# 查看 docker compose 配置 +cat docker-compose.prod.yml + +# 查看应用配置 +cat configs/config.yaml +``` + +--- + +### 3. 检查容器状态 + +```bash +# 查看所有容器 +docker compose -f docker-compose.prod.yml ps + +# 预期输出: +# NAME COMMAND SERVICE STATUS PORTS +# junhong_cmp-api-1 "/entrypoint-api.sh" api Up (healthy) 0.0.0.0:3000->3000/tcp +# junhong_cmp-worker-1 "/app/cmd/worker" worker Up + +# 如果状态不是 Up (healthy),查看具体问题 +docker compose -f docker-compose.prod.yml logs api +docker compose -f docker-compose.prod.yml logs worker +``` + +--- + +### 4. 测试 API 健康检查 + +```bash +# 在服务器上测试 +curl http://localhost:3000/health + +# 预期响应: +# { +# "code": 0, +# "message": "success", +# "data": { +# "status": "healthy", +# "timestamp": "2026-01-20T11:30:00+08:00" +# } +# } +``` + +--- + +## 常见问题排查 + +### 问题 1: 部署步骤失败 - "no configuration file provided" + +**现象**: +``` +Error: no configuration file provided: not found +``` + +**原因**: docker compose 没有找到配置文件 + +**解决**: ✅ 已在 `bf4ef37` 修复 +- 所有 `docker compose` 命令添加 `-f docker-compose.prod.yml` + +**验证**: +```bash +# 在部署目录执行 +cd /opt/junhong_cmp +docker compose -f docker-compose.prod.yml config +# 应该能正确输出配置 +``` + +--- + +### 问题 2: 容器启动失败 - 健康检查不通过 + +**现象**: +``` +junhong_cmp-api-1 Up (unhealthy) +``` + +**排查步骤**: + +1. **查看容器日志**: +```bash +docker compose -f docker-compose.prod.yml logs api --tail=50 +``` + +2. **常见原因**: + - 数据库连接失败 + - 配置文件路径错误 + - 端口冲突 + - 数据库迁移失败 + +3. **进入容器调试**: +```bash +docker compose -f docker-compose.prod.yml exec api sh + +# 在容器内检查 +ls -la /app/ +ls -la /app/configs/ +cat /app/configs/config.yaml +wget --spider http://localhost:3000/health +``` + +4. **手动测试健康检查**: +```bash +docker compose -f docker-compose.prod.yml exec api wget --no-verbose --tries=1 --spider http://localhost:3000/health +echo $? # 应该输出 0 +``` + +--- + +### 问题 3: 数据库迁移失败 + +**现象**: +``` +Failed to run migrations: ... +``` + +**排查步骤**: + +1. **检查数据库连接**: +```bash +# 在服务器上测试 PostgreSQL 连接 +docker compose -f docker-compose.prod.yml exec api sh -c ' + apk add postgresql-client && + PGPASSWORD="qycardPW@.cxj2026" psql -h cxd.whcxd.cn -p 16159 -U qycard001 -d qycard001 -c "SELECT version();" +' +``` + +2. **查看迁移日志**: +```bash +docker compose -f docker-compose.prod.yml logs api | grep -i migrate +``` + +3. **手动执行迁移**: +```bash +docker compose -f docker-compose.prod.yml exec api sh -c ' + cd /app && /app/migrate -path /app/migrations -database "postgres://..." up +' +``` + +--- + +### 问题 4: 镜像拉取失败 + +**现象**: +``` +Error response from daemon: pull access denied for registry.boss160.cn/junhong/cmp-fiber-api +``` + +**排查步骤**: + +1. **检查 Registry 登录**: +```bash +# 在服务器上手动登录 +docker login registry.boss160.cn +# 用户名: junhong_admin +# 密码: JunHong@2025!Registry +``` + +2. **手动拉取镜像**: +```bash +docker pull registry.boss160.cn/junhong/cmp-fiber-api:latest +docker pull registry.boss160.cn/junhong/cmp-fiber-worker:latest +``` + +3. **检查镜像是否存在**: +```bash +# 在本地 Mac 检查 +docker images | grep registry.boss160.cn +``` + +--- + +### 问题 5: 配置文件未同步 + +**现象**: +容器内的配置文件与仓库不一致 + +**排查步骤**: + +1. **检查部署目录的配置文件**: +```bash +cat /opt/junhong_cmp/configs/config.yaml +``` + +2. **对比仓库的配置文件**: +```bash +# 在 Runner 工作目录 +cat /tmp/actions/*/csxj2026-junhong_cmp_fiber/configs/config.yaml +``` + +3. **重新复制配置文件**: +```bash +cd /tmp/actions/*/csxj2026-junhong_cmp_fiber +sudo cp docker-compose.prod.yml /opt/junhong_cmp/ +sudo cp -r configs /opt/junhong_cmp/ +``` + +--- + +### 问题 6: Worker 容器启动失败 + +**现象**: +``` +junhong_cmp-worker-1 Exited (1) +``` + +**排查步骤**: + +1. **查看 Worker 日志**: +```bash +docker compose -f docker-compose.prod.yml logs worker --tail=50 +``` + +2. **检查 API 健康状态**: +Worker 依赖 API 的健康检查,确保 API 先启动成功 +```bash +docker compose -f docker-compose.prod.yml ps api +# 应该显示 Up (healthy) +``` + +3. **重启 Worker**: +```bash +docker compose -f docker-compose.prod.yml restart worker +``` + +--- + +## 从零重新部署 + +如果遇到无法解决的问题,可以完全清理后重新部署: + +```bash +# 1. 停止并删除所有容器 +cd /opt/junhong_cmp +docker compose -f docker-compose.prod.yml down -v + +# 2. 删除旧镜像 +docker rmi registry.boss160.cn/junhong/cmp-fiber-api:latest +docker rmi registry.boss160.cn/junhong/cmp-fiber-worker:latest + +# 3. 清理部署目录 +sudo rm -rf /opt/junhong_cmp/* + +# 4. 触发新的构建 +# 在本地 Mac 推送代码 +cd /Users/break/csxjProject/junhong_cmp_fiber +git commit --allow-empty -m "触发重新部署" +git push origin main + +# 5. 等待 CI/CD 完成(15-20分钟) +# 访问 https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions + +# 6. 验证部署 +ssh qycard001@47.111.166.169 -p 52022 +cd /opt/junhong_cmp +docker compose -f docker-compose.prod.yml ps +curl http://localhost:3000/health +``` + +--- + +## 监控和日志 + +### 实时查看日志 + +```bash +# API 日志 +docker compose -f docker-compose.prod.yml logs -f api + +# Worker 日志 +docker compose -f docker-compose.prod.yml logs -f worker + +# 所有服务日志 +docker compose -f docker-compose.prod.yml logs -f +``` + +### 查看应用日志文件 + +```bash +# API 日志 +tail -f /opt/junhong_cmp/logs/api.log +tail -f /opt/junhong_cmp/logs/access.log + +# Worker 日志 +tail -f /opt/junhong_cmp/logs/worker.log +``` + +### 检查容器资源使用 + +```bash +docker stats junhong_cmp-api-1 junhong_cmp-worker-1 +``` + +--- + +## 性能验证 + +### API 响应时间测试 + +```bash +# 安装 hey (HTTP load testing tool) +# Mac: brew install hey +# Linux: go install github.com/rakyll/hey@latest + +# 健康检查测试 (100 请求,10 并发) +hey -n 100 -c 10 http://47.111.166.169:3000/health + +# 预期指标: +# - P95 < 200ms +# - P99 < 500ms +# - 成功率 = 100% +``` + +### 数据库查询性能 + +```bash +# 在 PostgreSQL 中启用慢查询日志 +# 检查是否有查询 > 50ms +``` + +--- + +## 回滚策略 + +如果新版本有问题,可以快速回滚到之前的镜像版本: + +```bash +# 1. 拉取特定版本的镜像 +docker pull registry.boss160.cn/junhong/cmp-fiber-api:1d773c4 +docker pull registry.boss160.cn/junhong/cmp-fiber-worker:1d773c4 + +# 2. 修改 docker-compose.prod.yml 中的镜像标签 +vim /opt/junhong_cmp/docker-compose.prod.yml +# 将 :latest 改为 :1d773c4 + +# 3. 重新部署 +docker compose -f docker-compose.prod.yml up -d + +# 4. 验证 +curl http://localhost:3000/health +``` + +--- + +## 联系和支持 + +如果遇到无法解决的问题: + +1. **检查 Gitea Actions 日志**: https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions +2. **查看容器日志**: `docker compose -f docker-compose.prod.yml logs` +3. **检查服务器资源**: `df -h`, `free -h`, `docker system df` +4. **记录错误信息**: 完整的错误日志和复现步骤 + +--- + +## 成功部署的标志 + +当看到以下所有指标时,部署完全成功: + +✅ Gitea Actions 显示绿色 ✅ +✅ `docker compose ps` 显示所有容器 `Up (healthy)` +✅ `curl http://localhost:3000/health` 返回 200 + 正确的 JSON +✅ 日志中没有 ERROR 级别消息 +✅ API 响应时间 P95 < 200ms +✅ Worker 正常消费任务队列 + +--- + +## 附录:关键文件位置 + +### 服务器 +- **Runner 目录**: `/home/qycard001/act_runner` +- **部署目录**: `/opt/junhong_cmp` +- **Runner 配置**: `/home/qycard001/.runner` +- **临时工作目录**: `/home/qycard001/.cache/act/` + +### 本地 (Mac) +- **仓库目录**: `/Users/break/csxjProject/junhong_cmp_fiber` +- **关键文件**: + - `.gitea/workflows/deploy.yaml` + - `Dockerfile.api` + - `Dockerfile.worker` + - `docker-compose.prod.yml` + - `configs/config.yaml` + +### 私有 Registry +- **地址**: registry.boss160.cn +- **API 镜像**: `registry.boss160.cn/junhong/cmp-fiber-api` +- **Worker 镜像**: `registry.boss160.cn/junhong/cmp-fiber-worker` +- **基础镜像**: `registry.boss160.cn/base/golang:1.25.6-alpine` + +--- + +**最后更新**: 2026-01-20 11:11 +**文档版本**: 1.0 +**对应 Commit**: bf4ef37