Some checks failed
构建并部署到测试环境(无 SSH) / build-and-deploy (push) Failing after 3m23s
问题1: 端口不一致 - Dockerfile.api 中 EXPOSE 和健康检查使用 8088 - config.yaml 中 API 实际监听 3000 - 健康检查失败导致容器 unhealthy 问题2: 缺少数据库环境变量 - entrypoint-api.sh 需要 DB_HOST、DB_USER 等环境变量执行迁移 - docker-compose.prod.yml 没有定义这些变量 - 容器启动脚本立即退出 修复: - Dockerfile.api: EXPOSE 和健康检查改为 3000 - docker-compose.prod.yml: 添加完整的数据库环境变量
450 lines
9.5 KiB
Markdown
450 lines
9.5 KiB
Markdown
# 部署问题排查指南
|
||
|
||
## 当前状态
|
||
|
||
**最新修复**: Commit `bf4ef37` - 修复 docker compose 找不到配置文件:显式指定文件名
|
||
|
||
**修复内容**:
|
||
- 所有 `docker compose` 命令添加 `-f docker-compose.prod.yml` 参数
|
||
- 确保在正确的工作目录执行命令
|
||
|
||
---
|
||
|
||
## 快速验证清单
|
||
|
||
### 1. 检查 Gitea Actions 构建状态
|
||
|
||
访问: https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions
|
||
|
||
**查看最新运行**:
|
||
- 运行 ID: 查看最新的 workflow 运行
|
||
- 状态: 应该显示 ✅ 成功(绿色)
|
||
- 时间: 预计 15-20 分钟完成
|
||
|
||
**关键步骤验证**:
|
||
```
|
||
✅ 检出代码 (Checkout code)
|
||
✅ 设置镜像标签 (Set image tags)
|
||
✅ 登录 Docker Registry (Login to Docker Registry)
|
||
✅ 构建 API 镜像 (Build API image)
|
||
✅ 构建 Worker 镜像 (Build Worker image)
|
||
✅ 推送 API 镜像 (Push API image)
|
||
✅ 推送 Worker 镜像 (Push Worker image)
|
||
✅ 部署到测试服务器 (Deploy to test server) <-- 本次重点修复
|
||
```
|
||
|
||
---
|
||
|
||
### 2. SSH 登录服务器验证
|
||
|
||
```bash
|
||
# 登录服务器
|
||
ssh qycard001@47.111.166.169 -p 52022
|
||
|
||
# 进入部署目录
|
||
cd /opt/junhong_cmp
|
||
|
||
# 检查文件是否存在
|
||
ls -la
|
||
# 预期输出:
|
||
# docker-compose.prod.yml
|
||
# configs/config.yaml
|
||
# logs/
|
||
|
||
# 查看 docker compose 配置
|
||
cat docker-compose.prod.yml
|
||
|
||
# 查看应用配置
|
||
cat configs/config.yaml
|
||
```
|
||
|
||
---
|
||
|
||
### 3. 检查容器状态
|
||
|
||
```bash
|
||
# 查看所有容器
|
||
docker compose -f docker-compose.prod.yml ps
|
||
|
||
# 预期输出:
|
||
# NAME COMMAND SERVICE STATUS PORTS
|
||
# junhong_cmp-api-1 "/entrypoint-api.sh" api Up (healthy) 0.0.0.0:3000->3000/tcp
|
||
# junhong_cmp-worker-1 "/app/cmd/worker" worker Up
|
||
|
||
# 如果状态不是 Up (healthy),查看具体问题
|
||
docker compose -f docker-compose.prod.yml logs api
|
||
docker compose -f docker-compose.prod.yml logs worker
|
||
```
|
||
|
||
---
|
||
|
||
### 4. 测试 API 健康检查
|
||
|
||
```bash
|
||
# 在服务器上测试
|
||
curl http://localhost:3000/health
|
||
|
||
# 预期响应:
|
||
# {
|
||
# "code": 0,
|
||
# "message": "success",
|
||
# "data": {
|
||
# "status": "healthy",
|
||
# "timestamp": "2026-01-20T11:30:00+08:00"
|
||
# }
|
||
# }
|
||
```
|
||
|
||
---
|
||
|
||
## 常见问题排查
|
||
|
||
### 问题 1: 部署步骤失败 - "no configuration file provided"
|
||
|
||
**现象**:
|
||
```
|
||
Error: no configuration file provided: not found
|
||
```
|
||
|
||
**原因**: docker compose 没有找到配置文件
|
||
|
||
**解决**: ✅ 已在 `bf4ef37` 修复
|
||
- 所有 `docker compose` 命令添加 `-f docker-compose.prod.yml`
|
||
|
||
**验证**:
|
||
```bash
|
||
# 在部署目录执行
|
||
cd /opt/junhong_cmp
|
||
docker compose -f docker-compose.prod.yml config
|
||
# 应该能正确输出配置
|
||
```
|
||
|
||
---
|
||
|
||
### 问题 2: 容器启动失败 - 健康检查不通过
|
||
|
||
**现象**:
|
||
```
|
||
junhong_cmp-api-1 Up (unhealthy)
|
||
```
|
||
|
||
**排查步骤**:
|
||
|
||
1. **查看容器日志**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml logs api --tail=50
|
||
```
|
||
|
||
2. **常见原因**:
|
||
- 数据库连接失败
|
||
- 配置文件路径错误
|
||
- 端口冲突
|
||
- 数据库迁移失败
|
||
|
||
3. **进入容器调试**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml exec api sh
|
||
|
||
# 在容器内检查
|
||
ls -la /app/
|
||
ls -la /app/configs/
|
||
cat /app/configs/config.yaml
|
||
wget --spider http://localhost:3000/health
|
||
```
|
||
|
||
4. **手动测试健康检查**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml exec api wget --no-verbose --tries=1 --spider http://localhost:3000/health
|
||
echo $? # 应该输出 0
|
||
```
|
||
|
||
---
|
||
|
||
### 问题 3: 数据库迁移失败
|
||
|
||
**现象**:
|
||
```
|
||
Failed to run migrations: ...
|
||
```
|
||
|
||
**排查步骤**:
|
||
|
||
1. **检查数据库连接**:
|
||
```bash
|
||
# 在服务器上测试 PostgreSQL 连接
|
||
docker compose -f docker-compose.prod.yml exec api sh -c '
|
||
apk add postgresql-client &&
|
||
PGPASSWORD="qycardPW@.cxj2026" psql -h cxd.whcxd.cn -p 16159 -U qycard001 -d qycard001 -c "SELECT version();"
|
||
'
|
||
```
|
||
|
||
2. **查看迁移日志**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml logs api | grep -i migrate
|
||
```
|
||
|
||
3. **手动执行迁移**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml exec api sh -c '
|
||
cd /app && /app/migrate -path /app/migrations -database "postgres://..." up
|
||
'
|
||
```
|
||
|
||
---
|
||
|
||
### 问题 4: 镜像拉取失败
|
||
|
||
**现象**:
|
||
```
|
||
Error response from daemon: pull access denied for registry.boss160.cn/junhong/cmp-fiber-api
|
||
```
|
||
|
||
**排查步骤**:
|
||
|
||
1. **检查 Registry 登录**:
|
||
```bash
|
||
# 在服务器上手动登录
|
||
docker login registry.boss160.cn
|
||
# 用户名: junhong_admin
|
||
# 密码: JunHong@2025!Registry
|
||
```
|
||
|
||
2. **手动拉取镜像**:
|
||
```bash
|
||
docker pull registry.boss160.cn/junhong/cmp-fiber-api:latest
|
||
docker pull registry.boss160.cn/junhong/cmp-fiber-worker:latest
|
||
```
|
||
|
||
3. **检查镜像是否存在**:
|
||
```bash
|
||
# 在本地 Mac 检查
|
||
docker images | grep registry.boss160.cn
|
||
```
|
||
|
||
---
|
||
|
||
### 问题 5: 配置文件未同步
|
||
|
||
**现象**:
|
||
容器内的配置文件与仓库不一致
|
||
|
||
**排查步骤**:
|
||
|
||
1. **检查部署目录的配置文件**:
|
||
```bash
|
||
cat /opt/junhong_cmp/configs/config.yaml
|
||
```
|
||
|
||
2. **对比仓库的配置文件**:
|
||
```bash
|
||
# 在 Runner 工作目录
|
||
cat /tmp/actions/*/csxj2026-junhong_cmp_fiber/configs/config.yaml
|
||
```
|
||
|
||
3. **重新复制配置文件**:
|
||
```bash
|
||
cd /tmp/actions/*/csxj2026-junhong_cmp_fiber
|
||
sudo cp docker-compose.prod.yml /opt/junhong_cmp/
|
||
sudo cp -r configs /opt/junhong_cmp/
|
||
```
|
||
|
||
---
|
||
|
||
### 问题 6: Worker 容器启动失败
|
||
|
||
**现象**:
|
||
```
|
||
junhong_cmp-worker-1 Exited (1)
|
||
```
|
||
|
||
**排查步骤**:
|
||
|
||
1. **查看 Worker 日志**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml logs worker --tail=50
|
||
```
|
||
|
||
2. **检查 API 健康状态**:
|
||
Worker 依赖 API 的健康检查,确保 API 先启动成功
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml ps api
|
||
# 应该显示 Up (healthy)
|
||
```
|
||
|
||
3. **重启 Worker**:
|
||
```bash
|
||
docker compose -f docker-compose.prod.yml restart worker
|
||
```
|
||
|
||
---
|
||
|
||
## 从零重新部署
|
||
|
||
如果遇到无法解决的问题,可以完全清理后重新部署:
|
||
|
||
```bash
|
||
# 1. 停止并删除所有容器
|
||
cd /opt/junhong_cmp
|
||
docker compose -f docker-compose.prod.yml down -v
|
||
|
||
# 2. 删除旧镜像
|
||
docker rmi registry.boss160.cn/junhong/cmp-fiber-api:latest
|
||
docker rmi registry.boss160.cn/junhong/cmp-fiber-worker:latest
|
||
|
||
# 3. 清理部署目录
|
||
sudo rm -rf /opt/junhong_cmp/*
|
||
|
||
# 4. 触发新的构建
|
||
# 在本地 Mac 推送代码
|
||
cd /Users/break/csxjProject/junhong_cmp_fiber
|
||
git commit --allow-empty -m "触发重新部署"
|
||
git push origin main
|
||
|
||
# 5. 等待 CI/CD 完成(15-20分钟)
|
||
# 访问 https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions
|
||
|
||
# 6. 验证部署
|
||
ssh qycard001@47.111.166.169 -p 52022
|
||
cd /opt/junhong_cmp
|
||
docker compose -f docker-compose.prod.yml ps
|
||
curl http://localhost:3000/health
|
||
```
|
||
|
||
---
|
||
|
||
## 监控和日志
|
||
|
||
### 实时查看日志
|
||
|
||
```bash
|
||
# API 日志
|
||
docker compose -f docker-compose.prod.yml logs -f api
|
||
|
||
# Worker 日志
|
||
docker compose -f docker-compose.prod.yml logs -f worker
|
||
|
||
# 所有服务日志
|
||
docker compose -f docker-compose.prod.yml logs -f
|
||
```
|
||
|
||
### 查看应用日志文件
|
||
|
||
```bash
|
||
# API 日志
|
||
tail -f /opt/junhong_cmp/logs/api.log
|
||
tail -f /opt/junhong_cmp/logs/access.log
|
||
|
||
# Worker 日志
|
||
tail -f /opt/junhong_cmp/logs/worker.log
|
||
```
|
||
|
||
### 检查容器资源使用
|
||
|
||
```bash
|
||
docker stats junhong_cmp-api-1 junhong_cmp-worker-1
|
||
```
|
||
|
||
---
|
||
|
||
## 性能验证
|
||
|
||
### API 响应时间测试
|
||
|
||
```bash
|
||
# 安装 hey (HTTP load testing tool)
|
||
# Mac: brew install hey
|
||
# Linux: go install github.com/rakyll/hey@latest
|
||
|
||
# 健康检查测试 (100 请求,10 并发)
|
||
hey -n 100 -c 10 http://47.111.166.169:3000/health
|
||
|
||
# 预期指标:
|
||
# - P95 < 200ms
|
||
# - P99 < 500ms
|
||
# - 成功率 = 100%
|
||
```
|
||
|
||
### 数据库查询性能
|
||
|
||
```bash
|
||
# 在 PostgreSQL 中启用慢查询日志
|
||
# 检查是否有查询 > 50ms
|
||
```
|
||
|
||
---
|
||
|
||
## 回滚策略
|
||
|
||
如果新版本有问题,可以快速回滚到之前的镜像版本:
|
||
|
||
```bash
|
||
# 1. 拉取特定版本的镜像
|
||
docker pull registry.boss160.cn/junhong/cmp-fiber-api:1d773c4
|
||
docker pull registry.boss160.cn/junhong/cmp-fiber-worker:1d773c4
|
||
|
||
# 2. 修改 docker-compose.prod.yml 中的镜像标签
|
||
vim /opt/junhong_cmp/docker-compose.prod.yml
|
||
# 将 :latest 改为 :1d773c4
|
||
|
||
# 3. 重新部署
|
||
docker compose -f docker-compose.prod.yml up -d
|
||
|
||
# 4. 验证
|
||
curl http://localhost:3000/health
|
||
```
|
||
|
||
---
|
||
|
||
## 联系和支持
|
||
|
||
如果遇到无法解决的问题:
|
||
|
||
1. **检查 Gitea Actions 日志**: https://git.boss160.cn/csxj2026/junhong_cmp_fiber/actions
|
||
2. **查看容器日志**: `docker compose -f docker-compose.prod.yml logs`
|
||
3. **检查服务器资源**: `df -h`, `free -h`, `docker system df`
|
||
4. **记录错误信息**: 完整的错误日志和复现步骤
|
||
|
||
---
|
||
|
||
## 成功部署的标志
|
||
|
||
当看到以下所有指标时,部署完全成功:
|
||
|
||
✅ Gitea Actions 显示绿色 ✅
|
||
✅ `docker compose ps` 显示所有容器 `Up (healthy)`
|
||
✅ `curl http://localhost:3000/health` 返回 200 + 正确的 JSON
|
||
✅ 日志中没有 ERROR 级别消息
|
||
✅ API 响应时间 P95 < 200ms
|
||
✅ Worker 正常消费任务队列
|
||
|
||
---
|
||
|
||
## 附录:关键文件位置
|
||
|
||
### 服务器
|
||
- **Runner 目录**: `/home/qycard001/act_runner`
|
||
- **部署目录**: `/opt/junhong_cmp`
|
||
- **Runner 配置**: `/home/qycard001/.runner`
|
||
- **临时工作目录**: `/home/qycard001/.cache/act/`
|
||
|
||
### 本地 (Mac)
|
||
- **仓库目录**: `/Users/break/csxjProject/junhong_cmp_fiber`
|
||
- **关键文件**:
|
||
- `.gitea/workflows/deploy.yaml`
|
||
- `Dockerfile.api`
|
||
- `Dockerfile.worker`
|
||
- `docker-compose.prod.yml`
|
||
- `configs/config.yaml`
|
||
|
||
### 私有 Registry
|
||
- **地址**: registry.boss160.cn
|
||
- **API 镜像**: `registry.boss160.cn/junhong/cmp-fiber-api`
|
||
- **Worker 镜像**: `registry.boss160.cn/junhong/cmp-fiber-worker`
|
||
- **基础镜像**: `registry.boss160.cn/base/golang:1.25.6-alpine`
|
||
|
||
---
|
||
|
||
**最后更新**: 2026-01-20 11:11
|
||
**文档版本**: 1.0
|
||
**对应 Commit**: bf4ef37
|