修复 API 容器健康检查失败:统一端口配置并添加数据库环境变量
Some checks failed
构建并部署到测试环境(无 SSH) / build-and-deploy (push) Failing after 3m23s
Some checks failed
构建并部署到测试环境(无 SSH) / build-and-deploy (push) Failing after 3m23s
问题1: 端口不一致 - Dockerfile.api 中 EXPOSE 和健康检查使用 8088 - config.yaml 中 API 实际监听 3000 - 健康检查失败导致容器 unhealthy 问题2: 缺少数据库环境变量 - entrypoint-api.sh 需要 DB_HOST、DB_USER 等环境变量执行迁移 - docker-compose.prod.yml 没有定义这些变量 - 容器启动脚本立即退出 修复: - Dockerfile.api: EXPOSE 和健康检查改为 3000 - docker-compose.prod.yml: 添加完整的数据库环境变量
This commit is contained in:
@@ -71,11 +71,11 @@ RUN chmod +x /app/entrypoint.sh
|
||||
USER appuser
|
||||
|
||||
# 暴露端口
|
||||
EXPOSE 8088
|
||||
EXPOSE 3000
|
||||
|
||||
# 健康检查(使用 Alpine 自带的 wget)
|
||||
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
|
||||
CMD wget --no-verbose --tries=1 --spider http://localhost:8088/health || exit 1
|
||||
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
|
||||
|
||||
# 启动命令
|
||||
ENTRYPOINT ["/app/entrypoint.sh"]
|
||||
|
||||
@@ -7,6 +7,13 @@ services:
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "3000:3000"
|
||||
environment:
|
||||
- DB_HOST=cxd.whcxd.cn
|
||||
- DB_PORT=16159
|
||||
- DB_USER=erp_pgsql
|
||||
- DB_PASSWORD=erp_2025
|
||||
- DB_NAME=junhong_cmp_test
|
||||
- DB_SSLMODE=disable
|
||||
volumes:
|
||||
- ./configs:/app/configs:ro
|
||||
- ./logs:/app/logs
|
||||
|
||||
449
docs/DEPLOYMENT_TROUBLESHOOTING.md
Normal file
449
docs/DEPLOYMENT_TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,449 @@
|
||||
# 部署问题排查指南
|
||||
|
||||
## 当前状态
|
||||
|
||||
**最新修复**: Commit `bf4ef37` - 修复 docker compose 找不到配置文件:显式指定文件名
|
||||
|
||||
**修复内容**:
|
||||
- 所有 `docker compose` 命令添加 `-f docker-compose.prod.yml` 参数
|
||||
- 确保在正确的工作目录执行命令
|
||||
|
||||
---
|
||||
|
||||
## 快速验证清单
|
||||
|
||||
### 1. 检查 Gitea Actions 构建状态
|
||||
|
||||
访问: https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions
|
||||
|
||||
**查看最新运行**:
|
||||
- 运行 ID: 查看最新的 workflow 运行
|
||||
- 状态: 应该显示 ✅ 成功(绿色)
|
||||
- 时间: 预计 15-20 分钟完成
|
||||
|
||||
**关键步骤验证**:
|
||||
```
|
||||
✅ 检出代码 (Checkout code)
|
||||
✅ 设置镜像标签 (Set image tags)
|
||||
✅ 登录 Docker Registry (Login to Docker Registry)
|
||||
✅ 构建 API 镜像 (Build API image)
|
||||
✅ 构建 Worker 镜像 (Build Worker image)
|
||||
✅ 推送 API 镜像 (Push API image)
|
||||
✅ 推送 Worker 镜像 (Push Worker image)
|
||||
✅ 部署到测试服务器 (Deploy to test server) <-- 本次重点修复
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. SSH 登录服务器验证
|
||||
|
||||
```bash
|
||||
# 登录服务器
|
||||
ssh qycard001@47.111.166.169 -p 52022
|
||||
|
||||
# 进入部署目录
|
||||
cd /opt/junhong_cmp
|
||||
|
||||
# 检查文件是否存在
|
||||
ls -la
|
||||
# 预期输出:
|
||||
# docker-compose.prod.yml
|
||||
# configs/config.yaml
|
||||
# logs/
|
||||
|
||||
# 查看 docker compose 配置
|
||||
cat docker-compose.prod.yml
|
||||
|
||||
# 查看应用配置
|
||||
cat configs/config.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. 检查容器状态
|
||||
|
||||
```bash
|
||||
# 查看所有容器
|
||||
docker compose -f docker-compose.prod.yml ps
|
||||
|
||||
# 预期输出:
|
||||
# NAME COMMAND SERVICE STATUS PORTS
|
||||
# junhong_cmp-api-1 "/entrypoint-api.sh" api Up (healthy) 0.0.0.0:3000->3000/tcp
|
||||
# junhong_cmp-worker-1 "/app/cmd/worker" worker Up
|
||||
|
||||
# 如果状态不是 Up (healthy),查看具体问题
|
||||
docker compose -f docker-compose.prod.yml logs api
|
||||
docker compose -f docker-compose.prod.yml logs worker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. 测试 API 健康检查
|
||||
|
||||
```bash
|
||||
# 在服务器上测试
|
||||
curl http://localhost:3000/health
|
||||
|
||||
# 预期响应:
|
||||
# {
|
||||
# "code": 0,
|
||||
# "message": "success",
|
||||
# "data": {
|
||||
# "status": "healthy",
|
||||
# "timestamp": "2026-01-20T11:30:00+08:00"
|
||||
# }
|
||||
# }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 常见问题排查
|
||||
|
||||
### 问题 1: 部署步骤失败 - "no configuration file provided"
|
||||
|
||||
**现象**:
|
||||
```
|
||||
Error: no configuration file provided: not found
|
||||
```
|
||||
|
||||
**原因**: docker compose 没有找到配置文件
|
||||
|
||||
**解决**: ✅ 已在 `bf4ef37` 修复
|
||||
- 所有 `docker compose` 命令添加 `-f docker-compose.prod.yml`
|
||||
|
||||
**验证**:
|
||||
```bash
|
||||
# 在部署目录执行
|
||||
cd /opt/junhong_cmp
|
||||
docker compose -f docker-compose.prod.yml config
|
||||
# 应该能正确输出配置
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 2: 容器启动失败 - 健康检查不通过
|
||||
|
||||
**现象**:
|
||||
```
|
||||
junhong_cmp-api-1 Up (unhealthy)
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
1. **查看容器日志**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml logs api --tail=50
|
||||
```
|
||||
|
||||
2. **常见原因**:
|
||||
- 数据库连接失败
|
||||
- 配置文件路径错误
|
||||
- 端口冲突
|
||||
- 数据库迁移失败
|
||||
|
||||
3. **进入容器调试**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml exec api sh
|
||||
|
||||
# 在容器内检查
|
||||
ls -la /app/
|
||||
ls -la /app/configs/
|
||||
cat /app/configs/config.yaml
|
||||
wget --spider http://localhost:3000/health
|
||||
```
|
||||
|
||||
4. **手动测试健康检查**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml exec api wget --no-verbose --tries=1 --spider http://localhost:3000/health
|
||||
echo $? # 应该输出 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 3: 数据库迁移失败
|
||||
|
||||
**现象**:
|
||||
```
|
||||
Failed to run migrations: ...
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
1. **检查数据库连接**:
|
||||
```bash
|
||||
# 在服务器上测试 PostgreSQL 连接
|
||||
docker compose -f docker-compose.prod.yml exec api sh -c '
|
||||
apk add postgresql-client &&
|
||||
PGPASSWORD="qycardPW@.cxj2026" psql -h cxd.whcxd.cn -p 16159 -U qycard001 -d qycard001 -c "SELECT version();"
|
||||
'
|
||||
```
|
||||
|
||||
2. **查看迁移日志**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml logs api | grep -i migrate
|
||||
```
|
||||
|
||||
3. **手动执行迁移**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml exec api sh -c '
|
||||
cd /app && /app/migrate -path /app/migrations -database "postgres://..." up
|
||||
'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 4: 镜像拉取失败
|
||||
|
||||
**现象**:
|
||||
```
|
||||
Error response from daemon: pull access denied for registry.boss160.cn/junhong/cmp-fiber-api
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
1. **检查 Registry 登录**:
|
||||
```bash
|
||||
# 在服务器上手动登录
|
||||
docker login registry.boss160.cn
|
||||
# 用户名: junhong_admin
|
||||
# 密码: JunHong@2025!Registry
|
||||
```
|
||||
|
||||
2. **手动拉取镜像**:
|
||||
```bash
|
||||
docker pull registry.boss160.cn/junhong/cmp-fiber-api:latest
|
||||
docker pull registry.boss160.cn/junhong/cmp-fiber-worker:latest
|
||||
```
|
||||
|
||||
3. **检查镜像是否存在**:
|
||||
```bash
|
||||
# 在本地 Mac 检查
|
||||
docker images | grep registry.boss160.cn
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 5: 配置文件未同步
|
||||
|
||||
**现象**:
|
||||
容器内的配置文件与仓库不一致
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
1. **检查部署目录的配置文件**:
|
||||
```bash
|
||||
cat /opt/junhong_cmp/configs/config.yaml
|
||||
```
|
||||
|
||||
2. **对比仓库的配置文件**:
|
||||
```bash
|
||||
# 在 Runner 工作目录
|
||||
cat /tmp/actions/*/csxj2026-junhong_cmp_fiber/configs/config.yaml
|
||||
```
|
||||
|
||||
3. **重新复制配置文件**:
|
||||
```bash
|
||||
cd /tmp/actions/*/csxj2026-junhong_cmp_fiber
|
||||
sudo cp docker-compose.prod.yml /opt/junhong_cmp/
|
||||
sudo cp -r configs /opt/junhong_cmp/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 问题 6: Worker 容器启动失败
|
||||
|
||||
**现象**:
|
||||
```
|
||||
junhong_cmp-worker-1 Exited (1)
|
||||
```
|
||||
|
||||
**排查步骤**:
|
||||
|
||||
1. **查看 Worker 日志**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml logs worker --tail=50
|
||||
```
|
||||
|
||||
2. **检查 API 健康状态**:
|
||||
Worker 依赖 API 的健康检查,确保 API 先启动成功
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml ps api
|
||||
# 应该显示 Up (healthy)
|
||||
```
|
||||
|
||||
3. **重启 Worker**:
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml restart worker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 从零重新部署
|
||||
|
||||
如果遇到无法解决的问题,可以完全清理后重新部署:
|
||||
|
||||
```bash
|
||||
# 1. 停止并删除所有容器
|
||||
cd /opt/junhong_cmp
|
||||
docker compose -f docker-compose.prod.yml down -v
|
||||
|
||||
# 2. 删除旧镜像
|
||||
docker rmi registry.boss160.cn/junhong/cmp-fiber-api:latest
|
||||
docker rmi registry.boss160.cn/junhong/cmp-fiber-worker:latest
|
||||
|
||||
# 3. 清理部署目录
|
||||
sudo rm -rf /opt/junhong_cmp/*
|
||||
|
||||
# 4. 触发新的构建
|
||||
# 在本地 Mac 推送代码
|
||||
cd /Users/break/csxjProject/junhong_cmp_fiber
|
||||
git commit --allow-empty -m "触发重新部署"
|
||||
git push origin main
|
||||
|
||||
# 5. 等待 CI/CD 完成(15-20分钟)
|
||||
# 访问 https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions
|
||||
|
||||
# 6. 验证部署
|
||||
ssh qycard001@47.111.166.169 -p 52022
|
||||
cd /opt/junhong_cmp
|
||||
docker compose -f docker-compose.prod.yml ps
|
||||
curl http://localhost:3000/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 监控和日志
|
||||
|
||||
### 实时查看日志
|
||||
|
||||
```bash
|
||||
# API 日志
|
||||
docker compose -f docker-compose.prod.yml logs -f api
|
||||
|
||||
# Worker 日志
|
||||
docker compose -f docker-compose.prod.yml logs -f worker
|
||||
|
||||
# 所有服务日志
|
||||
docker compose -f docker-compose.prod.yml logs -f
|
||||
```
|
||||
|
||||
### 查看应用日志文件
|
||||
|
||||
```bash
|
||||
# API 日志
|
||||
tail -f /opt/junhong_cmp/logs/api.log
|
||||
tail -f /opt/junhong_cmp/logs/access.log
|
||||
|
||||
# Worker 日志
|
||||
tail -f /opt/junhong_cmp/logs/worker.log
|
||||
```
|
||||
|
||||
### 检查容器资源使用
|
||||
|
||||
```bash
|
||||
docker stats junhong_cmp-api-1 junhong_cmp-worker-1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 性能验证
|
||||
|
||||
### API 响应时间测试
|
||||
|
||||
```bash
|
||||
# 安装 hey (HTTP load testing tool)
|
||||
# Mac: brew install hey
|
||||
# Linux: go install github.com/rakyll/hey@latest
|
||||
|
||||
# 健康检查测试 (100 请求,10 并发)
|
||||
hey -n 100 -c 10 http://47.111.166.169:3000/health
|
||||
|
||||
# 预期指标:
|
||||
# - P95 < 200ms
|
||||
# - P99 < 500ms
|
||||
# - 成功率 = 100%
|
||||
```
|
||||
|
||||
### 数据库查询性能
|
||||
|
||||
```bash
|
||||
# 在 PostgreSQL 中启用慢查询日志
|
||||
# 检查是否有查询 > 50ms
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 回滚策略
|
||||
|
||||
如果新版本有问题,可以快速回滚到之前的镜像版本:
|
||||
|
||||
```bash
|
||||
# 1. 拉取特定版本的镜像
|
||||
docker pull registry.boss160.cn/junhong/cmp-fiber-api:1d773c4
|
||||
docker pull registry.boss160.cn/junhong/cmp-fiber-worker:1d773c4
|
||||
|
||||
# 2. 修改 docker-compose.prod.yml 中的镜像标签
|
||||
vim /opt/junhong_cmp/docker-compose.prod.yml
|
||||
# 将 :latest 改为 :1d773c4
|
||||
|
||||
# 3. 重新部署
|
||||
docker compose -f docker-compose.prod.yml up -d
|
||||
|
||||
# 4. 验证
|
||||
curl http://localhost:3000/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 联系和支持
|
||||
|
||||
如果遇到无法解决的问题:
|
||||
|
||||
1. **检查 Gitea Actions 日志**: https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions
|
||||
2. **查看容器日志**: `docker compose -f docker-compose.prod.yml logs`
|
||||
3. **检查服务器资源**: `df -h`, `free -h`, `docker system df`
|
||||
4. **记录错误信息**: 完整的错误日志和复现步骤
|
||||
|
||||
---
|
||||
|
||||
## 成功部署的标志
|
||||
|
||||
当看到以下所有指标时,部署完全成功:
|
||||
|
||||
✅ Gitea Actions 显示绿色 ✅
|
||||
✅ `docker compose ps` 显示所有容器 `Up (healthy)`
|
||||
✅ `curl http://localhost:3000/health` 返回 200 + 正确的 JSON
|
||||
✅ 日志中没有 ERROR 级别消息
|
||||
✅ API 响应时间 P95 < 200ms
|
||||
✅ Worker 正常消费任务队列
|
||||
|
||||
---
|
||||
|
||||
## 附录:关键文件位置
|
||||
|
||||
### 服务器
|
||||
- **Runner 目录**: `/home/qycard001/act_runner`
|
||||
- **部署目录**: `/opt/junhong_cmp`
|
||||
- **Runner 配置**: `/home/qycard001/.runner`
|
||||
- **临时工作目录**: `/home/qycard001/.cache/act/`
|
||||
|
||||
### 本地 (Mac)
|
||||
- **仓库目录**: `/Users/break/csxjProject/junhong_cmp_fiber`
|
||||
- **关键文件**:
|
||||
- `.gitea/workflows/deploy.yaml`
|
||||
- `Dockerfile.api`
|
||||
- `Dockerfile.worker`
|
||||
- `docker-compose.prod.yml`
|
||||
- `configs/config.yaml`
|
||||
|
||||
### 私有 Registry
|
||||
- **地址**: registry.boss160.cn
|
||||
- **API 镜像**: `registry.boss160.cn/junhong/cmp-fiber-api`
|
||||
- **Worker 镜像**: `registry.boss160.cn/junhong/cmp-fiber-worker`
|
||||
- **基础镜像**: `registry.boss160.cn/base/golang:1.25.6-alpine`
|
||||
|
||||
---
|
||||
|
||||
**最后更新**: 2026-01-20 11:11
|
||||
**文档版本**: 1.0
|
||||
**对应 Commit**: bf4ef37
|
||||
Reference in New Issue
Block a user