DevOps · #docker#optimization#multi-stage

Docker多阶段构建与镜像优化

2024.07.10 7 min 2.8k
// 目录 · contents

前言

Docker镜像的大小直接影响构建速度、推送速度、拉取速度和攻击面。一个未经优化的Java应用镜像可能超过1GB,而通过多阶段构建和优化技巧,可以将其缩减到100MB以下。本文将系统讲解Docker镜像优化的各种技术。

镜像层原理

Docker镜像由只读层(Layer)组成,每条Dockerfile指令创建一个新层:

graph TB
    subgraph Image["Docker镜像"]
        L1["Layer 1: FROM ubuntu:22.04<br>(77MB)"]
        L2["Layer 2: RUN apt-get install<br>(200MB)"]
        L3["Layer 3: COPY requirements.txt<br>(1KB)"]
        L4["Layer 4: RUN pip install<br>(150MB)"]
        L5["Layer 5: COPY . .<br>(10MB)"]
    end

    subgraph Container["容器"]
        RW["Read-Write Layer<br>(容器层)"]
    end

    L1 --> L2 --> L3 --> L4 --> L5
    L5 --> RW

    style RW fill:#4CAF50,color:#fff
    style L1 fill:#90CAF9
    style L2 fill:#90CAF9
    style L3 fill:#90CAF9
    style L4 fill:#90CAF9
    style L5 fill:#90CAF9

层缓存机制

flowchart TD
    A["Dockerfile指令"] --> B{层缓存命中?}
    B -->|是| C["使用缓存层<br>(跳过执行)"]
    B -->|否| D["执行指令<br>创建新层"]
    D --> E["后续所有层<br>缓存失效"]

    style C fill:#4CAF50,color:#fff
    style E fill:#f44336,color:#fff

利用缓存的关键:将变化频率低的指令放在前面,变化频率高的放在后面。

多阶段构建

基础示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# ===== 构建阶段 =====
FROM golang:1.22-alpine AS builder

WORKDIR /app

# 先复制依赖文件(利用缓存)
COPY go.mod go.sum ./
RUN go mod download

# 再复制源代码
COPY . .

# 编译静态二进制文件
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-w -s -X main.version=1.0.0" \
-o /app/server ./cmd/server

# ===== 运行阶段 =====
FROM gcr.io/distroless/static-debian12:nonroot

COPY --from=builder /app/server /server
COPY --from=builder /app/configs /configs

EXPOSE 8080
USER nonroot:nonroot

ENTRYPOINT ["/server"]
graph LR
    subgraph BuildStage["构建阶段 (1GB+)"]
        Go["Go SDK"]
        Deps["Dependencies"]
        Source["Source Code"]
        Binary["Binary"]
    end

    subgraph RunStage["运行阶段 (~5MB)"]
        App["Binary"]
        Config["Configs"]
    end

    Binary --> |"COPY --from=builder"| App
    Source -.-> |"丢弃"| Discard["不包含在最终镜像中"]

    style BuildStage fill:#FFE0B2
    style RunStage fill:#C8E6C9

Java应用多阶段构建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# ===== 阶段1: 依赖缓存 =====
FROM eclipse-temurin:21-jdk-alpine AS deps
WORKDIR /app
COPY pom.xml .
COPY .mvn .mvn
COPY mvnw .
RUN chmod +x mvnw && ./mvnw dependency:go-offline -B

# ===== 阶段2: 构建 =====
FROM deps AS builder
COPY src ./src
RUN ./mvnw package -DskipTests -B

# 使用jlink创建自定义JRE(仅包含所需模块)
RUN jlink \
--add-modules java.base,java.logging,java.sql,java.naming,java.management,java.instrument,java.desktop \
--strip-debug \
--no-man-pages \
--no-header-files \
--compress=zip-6 \
--output /custom-jre

# ===== 阶段3: 运行 =====
FROM alpine:3.19

# 安装必要的运行时依赖
RUN apk add --no-cache tini

# 复制自定义JRE
COPY --from=builder /custom-jre /opt/java
ENV PATH="/opt/java/bin:$PATH"

# 创建非root用户
RUN addgroup -S app && adduser -S app -G app
USER app

WORKDIR /app
COPY --from=builder --chown=app:app /app/target/*.jar app.jar

EXPOSE 8080

ENTRYPOINT ["tini", "--"]
CMD ["java", \
"-XX:+UseContainerSupport", \
"-XX:MaxRAMPercentage=75.0", \
"-XX:+UseZGC", \
"-jar", "app.jar"]

Node.js应用多阶段构建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# ===== 阶段1: 依赖安装 =====
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && \
cp -R node_modules /production_modules && \
npm ci

# ===== 阶段2: 构建 =====
FROM deps AS builder
COPY . .
RUN npm run build

# ===== 阶段3: 运行 =====
FROM node:20-alpine AS runner
WORKDIR /app

ENV NODE_ENV=production

RUN addgroup --system --gid 1001 nodejs && \
adduser --system --uid 1001 nextjs

# 仅复制生产依赖
COPY --from=deps /production_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder /app/package.json ./

USER nextjs
EXPOSE 3000

CMD ["node", "dist/main.js"]

Python应用多阶段构建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# ===== 阶段1: 构建wheel =====
FROM python:3.12-slim AS builder

RUN pip install --no-cache-dir poetry==1.7.1

WORKDIR /app
COPY pyproject.toml poetry.lock ./
RUN poetry export -f requirements.txt --output requirements.txt --without-hashes

RUN pip wheel --no-cache-dir --no-deps --wheel-dir /wheels -r requirements.txt

# ===== 阶段2: 运行 =====
FROM python:3.12-slim

RUN groupadd -r app && useradd -r -g app app

COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir /wheels/*.whl && rm -rf /wheels

WORKDIR /app
COPY --chown=app:app . .

USER app
EXPOSE 8000

CMD ["gunicorn", "app.main:app", \
"--bind", "0.0.0.0:8000", \
"--workers", "4", \
"--worker-class", "uvicorn.workers.UvicornWorker"]

Distroless镜像

Distroless镜像只包含应用程序和运行时依赖,没有包管理器、shell和其他工具,极大减少了攻击面:

graph TB
    subgraph Traditional["传统镜像 (ubuntu)"]
        T_OS["完整OS<br>apt, bash, curl..."]
        T_Runtime["运行时"]
        T_App["应用程序"]
    end

    subgraph Alpine["Alpine镜像"]
        A_OS["最小化OS<br>apk, ash"]
        A_Runtime["运行时"]
        A_App["应用程序"]
    end

    subgraph Distroless["Distroless镜像"]
        D_Runtime["最小运行时"]
        D_App["应用程序"]
    end

    subgraph Scratch["Scratch镜像"]
        S_App["静态二进制"]
    end
基础镜像 大小 Shell 包管理器 适用场景
ubuntu:22.04 ~77MB apt 开发调试
alpine:3.19 ~7MB apk 通用生产
distroless/base ~20MB 需要glibc的应用
distroless/static ~2MB 静态编译的应用
scratch 0MB 静态二进制
1
2
3
4
5
6
7
8
9
10
11
12
13
# 使用Distroless镜像
FROM gcr.io/distroless/java21-debian12:nonroot
COPY --from=builder /app/target/app.jar /app.jar
EXPOSE 8080
USER nonroot
ENTRYPOINT ["java", "-jar", "/app.jar"]

# 使用scratch镜像(最小可能)
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

BuildKit高级特性

缓存挂载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# syntax=docker/dockerfile:1.7

# Maven缓存
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN --mount=type=cache,target=/root/.m2 \
mvn dependency:go-offline -B

COPY src ./src
RUN --mount=type=cache,target=/root/.m2 \
mvn package -DskipTests -B

# Go模块缓存
FROM golang:1.22 AS go-builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -o /app/server

# npm缓存
FROM node:20 AS node-builder
WORKDIR /app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \
npm ci
COPY . .
RUN npm run build

Secret挂载

1
2
3
4
5
6
7
# 安全地使用密钥(不会留在镜像层中)
FROM alpine AS builder
RUN --mount=type=secret,id=npm_token \
NPM_TOKEN=$(cat /run/secrets/npm_token) && \
echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > .npmrc && \
npm ci && \
rm .npmrc
1
2
# 构建时传入secret
docker buildx build --secret id=npm_token,src=.npm_token .

多平台构建

1
2
3
4
5
6
7
8
9
# 创建多平台builder
docker buildx create --name multiplatform --use

# 构建多平台镜像
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag registry.example.com/myapp:1.0 \
--push \
.
graph TB
    Dockerfile["Dockerfile"] --> Buildx["docker buildx"]
    Buildx --> AMD64["linux/amd64<br>x86_64镜像"]
    Buildx --> ARM64["linux/arm64<br>ARM镜像"]

    AMD64 --> Manifest["Manifest List<br>(多架构镜像)"]
    ARM64 --> Manifest
    Manifest --> Registry["Container Registry"]

.dockerignore

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# .dockerignore
.git
.gitignore
.github
.vscode
.idea

# 构建产物
node_modules
dist
target
build
__pycache__

# 测试文件
*_test.go
*.test.js
*.spec.ts
tests/
test/

# 文档
*.md
docs/
LICENSE

# Docker相关
Dockerfile*
docker-compose*
.dockerignore

# 环境文件
.env
.env.*
*.pem
*.key

# CI/CD
.gitlab-ci.yml
Jenkinsfile
.travis.yml

安全扫描

1
2
3
4
5
6
7
8
9
10
11
12
# Trivy扫描
trivy image --severity HIGH,CRITICAL myapp:latest

# Docker Scout(Docker官方)
docker scout cves myapp:latest

# Grype扫描
grype myapp:latest

# 在CI/CD中集成扫描
docker buildx build -t myapp:latest .
trivy image --exit-code 1 --severity CRITICAL myapp:latest
graph LR
    Build["构建镜像"] --> Scan["安全扫描<br>(Trivy/Grype)"]
    Scan --> |"无高危漏洞"| Push["推送Registry"]
    Scan --> |"存在高危漏洞"| Block["阻断流水线"]
    Push --> Sign["镜像签名<br>(Cosign)"]
    Sign --> Deploy["部署"]
1
2
3
4
5
# 使用Cosign签名镜像
cosign sign --key cosign.key registry.example.com/myapp:1.0

# 验证镜像签名
cosign verify --key cosign.pub registry.example.com/myapp:1.0

镜像大小优化技巧

1. 合并RUN指令

1
2
3
4
5
6
7
8
9
10
11
12
# 反面示例:每条RUN创建一个层
RUN apt-get update
RUN apt-get install -y python3
RUN apt-get install -y pip
RUN rm -rf /var/lib/apt/lists/*

# 正确方式:合并为一条RUN
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists/*

2. 使用.dockerignore排除无关文件

3. 选择合适的基础镜像

1
2
3
4
5
6
7
# 对比不同基础镜像的大小
# node:20 -> ~1.1GB
# node:20-slim -> ~200MB
# node:20-alpine -> ~130MB
# distroless/nodejs20 -> ~120MB

FROM node:20-alpine

4. 清理构建缓存

1
2
3
4
5
6
7
8
9
10
11
12
# Alpine清理
RUN apk add --no-cache --virtual .build-deps \
gcc musl-dev python3-dev && \
pip install --no-cache-dir cryptography && \
apk del .build-deps

# Debian/Ubuntu清理
RUN apt-get update && \
apt-get install -y --no-install-recommends build-essential && \
make && make install && \
apt-get purge -y --auto-remove build-essential && \
rm -rf /var/lib/apt/lists/*

5. 使用dive分析镜像

1
2
3
4
5
6
7
8
# 安装dive
brew install dive

# 分析镜像层
dive myapp:latest

# CI模式(检测浪费空间)
dive myapp:latest --ci --ci-config .dive-ci.yaml

Dockerfile最佳实践清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# syntax=docker/dockerfile:1.7

# 1. 固定基础镜像版本(不用latest)
FROM golang:1.22.1-alpine AS builder

# 2. 设置WORKDIR
WORKDIR /app

# 3. 利用层缓存:先复制依赖文件
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod go mod download

# 4. 复制源代码
COPY . .

# 5. 编译优化的二进制
RUN --mount=type=cache,target=/root/.cache/go-build \
CGO_ENABLED=0 go build -ldflags="-w -s" -o /server

# 6. 使用最小基础镜像
FROM gcr.io/distroless/static-debian12:nonroot

# 7. 添加元数据标签
LABEL org.opencontainers.image.source="https://github.com/example/myapp"
LABEL org.opencontainers.image.version="1.0.0"
LABEL org.opencontainers.image.description="My application"

# 8. 从构建阶段复制制品
COPY --from=builder /server /server

# 9. 声明端口
EXPOSE 8080

# 10. 使用非root用户
USER nonroot:nonroot

# 11. 设置健康检查
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD ["/server", "healthcheck"]

# 12. 使用ENTRYPOINT而非CMD
ENTRYPOINT ["/server"]

镜像大小对比

graph LR
    subgraph Sizes["Go应用镜像大小对比"]
        S1["ubuntu:22.04<br>+ Go runtime<br>~1.2GB"]
        S2["golang:1.22-alpine<br>~350MB"]
        S3["alpine:3.19<br>+ binary<br>~15MB"]
        S4["distroless/static<br>+ binary<br>~7MB"]
        S5["scratch<br>+ binary<br>~5MB"]
    end

    S1 --> |"-70%"| S2
    S2 --> |"-95%"| S3
    S3 --> |"-53%"| S4
    S4 --> |"-28%"| S5

总结

Docker镜像优化的核心策略:

  1. 多阶段构建:构建环境与运行环境分离,只复制必要文件
  2. 最小基础镜像:Distroless或Alpine,减少攻击面和镜像体积
  3. 层缓存优化:将变化少的层放前面,使用BuildKit缓存挂载
  4. 安全加固:非root用户、镜像扫描、签名验证
  5. 构建缓存:利用BuildKit缓存加速构建
  6. 定期清理:删除构建依赖、清理包管理器缓存

遵循这些实践,可以构建出体积小、安全性高、构建速度快的生产级容器镜像。

作者 · authorzt
发布 · date2024-07-10
篇幅 · length2.8k 字 · 7 min
许可 · licenseCC BY-SA 4.0
$ echo "comments" · 评论