DevOps · #networking#kubernetes#cni

Kubernetes网络模型与CNI插件详解

2024.05.01 7 min 2.7k
// 目录 · contents

前言

网络是Kubernetes中最复杂的子系统之一。Kubernetes定义了一套简洁而强大的网络模型,而具体实现则委托给CNI插件。本文将从Pod网络、Service网络、Ingress到CNI插件逐层剖析。

Kubernetes网络模型基本原则

Kubernetes网络模型建立在三个基本原则之上:

  1. 每个Pod拥有独立的IP地址
  2. 所有Pod之间可以直接通信,无需NAT
  3. 节点上的Agent(如kubelet)可以与该节点上的所有Pod通信
graph TB
    subgraph Cluster["Kubernetes Cluster"]
        subgraph Node1["Node 1"]
            P1["Pod A<br>10.244.1.2"]
            P2["Pod B<br>10.244.1.3"]
        end
        subgraph Node2["Node 2"]
            P3["Pod C<br>10.244.2.2"]
            P4["Pod D<br>10.244.2.3"]
        end
        SVC["Service<br>10.96.0.100"]
    end

    External["External Client"] --> SVC
    P1 <-->|"直接路由<br>无NAT"| P3
    P2 <-->|"直接路由<br>无NAT"| P4
    SVC --> P1
    SVC --> P3

Pod网络

Pod内部网络

同一Pod内的容器共享Network Namespace,因此可以通过localhost互相通信。这是通过pause容器(又称infra容器)实现的。

graph TB
    subgraph Pod["Pod (10.244.1.2)"]
        Pause["pause容器<br>持有Network Namespace"]
        App["应用容器<br>localhost:8080"]
        Sidecar["Sidecar容器<br>localhost:9090"]
    end

    Pause -.-> |"共享网络命名空间"| App
    Pause -.-> |"共享网络命名空间"| Sidecar
    App <-.-> |"localhost通信"| Sidecar
1
2
3
4
5
6
# 查看Pod的pause容器
crictl ps | grep <pod-id>
# 输出中会看到一个k8s.gcr.io/pause容器

# 进入容器查看网络命名空间
crictl inspect <container-id> | jq '.info.runtimeSpec.linux.namespaces'

Pod间网络(同节点)

同一节点上的Pod通过虚拟网桥(如cbr0或cni0)直接通信:

graph TB
    subgraph Node["Node 1"]
        P1["Pod A<br>veth1 - 10.244.1.2"]
        P2["Pod B<br>veth2 - 10.244.1.3"]
        Bridge["cni0 / cbr0<br>10.244.1.1"]
        ETH["eth0<br>192.168.1.10"]
    end

    P1 --> |veth pair| Bridge
    P2 --> |veth pair| Bridge
    Bridge --> ETH

    style Bridge fill:#FF9800,color:#fff
1
2
3
4
5
6
7
8
9
10
11
12
# 在节点上查看网桥
brctl show
# bridge name bridge id STP enabled interfaces
# cni0 8000.1234567890ab no veth1234
# veth5678

# 查看veth pair
ip link show type veth

# 查看路由表
ip route
# 10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1

Pod间网络(跨节点)

跨节点的Pod通信需要CNI插件来实现,通常采用Overlay(封装)或Underlay(路由)方案:

graph TB
    subgraph Node1["Node 1 (192.168.1.10)"]
        PA["Pod A<br>10.244.1.2"]
        BR1["cni0"]
        ETH1["eth0"]
    end

    subgraph Node2["Node 2 (192.168.1.11)"]
        PB["Pod B<br>10.244.2.2"]
        BR2["cni0"]
        ETH2["eth0"]
    end

    PA --> BR1
    BR1 --> ETH1
    ETH1 <-->|"VXLAN隧道<br>或BGP路由"| ETH2
    ETH2 --> BR2
    BR2 --> PB

Service网络

Service为一组Pod提供稳定的访问入口和负载均衡。

Service类型

graph LR
    subgraph Types["Service类型"]
        CIP["ClusterIP<br>集群内部访问"]
        NP["NodePort<br>节点端口暴露"]
        LB["LoadBalancer<br>云负载均衡"]
        EI["ExternalName<br>CNAME映射"]
    end

    CIP --> |"默认"| Internal["集群内部流量"]
    NP --> |"30000-32767"| NodeAccess["节点IP:端口"]
    LB --> |"云厂商LB"| ExtLB["外部负载均衡"]
    EI --> |"DNS CNAME"| ExtDNS["外部DNS"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# ClusterIP Service
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
type: ClusterIP
selector:
app: web
ports:
- port: 80
targetPort: 8080
protocol: TCP

---
# NodePort Service
apiVersion: v1
kind: Service
metadata:
name: web-nodeport
spec:
type: NodePort
selector:
app: web
ports:
- port: 80
targetPort: 8080
nodePort: 30080

---
# LoadBalancer Service
apiVersion: v1
kind: Service
metadata:
name: web-lb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
type: LoadBalancer
selector:
app: web
ports:
- port: 80
targetPort: 8080

Service发现机制

Kubernetes提供两种Service发现方式:

1. 环境变量:kubelet在创建Pod时注入Service信息

1
2
3
4
# Pod内可以看到的环境变量
env | grep WEB_SERVICE
# WEB_SERVICE_SERVICE_HOST=10.96.0.100
# WEB_SERVICE_SERVICE_PORT=80

2. DNS(推荐):CoreDNS为每个Service创建DNS记录

1
2
3
4
5
6
7
8
9
10
11
12
# Service DNS格式: <service-name>.<namespace>.svc.cluster.local
nslookup web-service.default.svc.cluster.local

# Headless Service (clusterIP: None) 返回Pod IP列表
# 适用于StatefulSet等有状态服务
nslookup mysql.default.svc.cluster.local
# Server: 10.96.0.10
# Address: 10.96.0.10#53
# Name: mysql-0.mysql.default.svc.cluster.local
# Address: 10.244.1.5
# Name: mysql-1.mysql.default.svc.cluster.local
# Address: 10.244.2.3

EndpointSlice

Kubernetes 1.21+默认使用EndpointSlice替代Endpoints,提供更好的可扩展性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: web-service-abc
labels:
kubernetes.io/service-name: web-service
addressType: IPv4
ports:
- name: http
port: 8080
protocol: TCP
endpoints:
- addresses:
- "10.244.1.5"
conditions:
ready: true
nodeName: node-1
- addresses:
- "10.244.2.3"
conditions:
ready: true
nodeName: node-2

Ingress

Ingress提供HTTP/HTTPS层的路由规则,将外部请求路由到内部Service:

graph LR
    Client["客户端"] --> LB["Load Balancer"]
    LB --> IC["Ingress Controller"]

    IC --> |"api.example.com"| SVC1["API Service"]
    IC --> |"web.example.com"| SVC2["Web Service"]
    IC --> |"*.example.com/admin"| SVC3["Admin Service"]

    SVC1 --> P1["Pod"]
    SVC1 --> P2["Pod"]
    SVC2 --> P3["Pod"]
    SVC3 --> P4["Pod"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/rate-limit: "100"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
- web.example.com
secretName: tls-secret
rules:
- host: api.example.com
http:
paths:
- path: /v1
pathType: Prefix
backend:
service:
name: api-v1
port:
number: 80
- path: /v2
pathType: Prefix
backend:
service:
name: api-v2
port:
number: 80
- host: web.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-frontend
port:
number: 80

Gateway API(Ingress的继任者)

Kubernetes Gateway API是新一代的入口流量管理标准,提供更丰富的功能:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Gateway 定义
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: main-gateway
spec:
gatewayClassName: istio
listeners:
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: tls-secret
---
# HTTPRoute 定义
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-route
spec:
parentRefs:
- name: main-gateway
hostnames:
- api.example.com
rules:
- matches:
- path:
type: PathPrefix
value: /v1
backendRefs:
- name: api-v1
port: 80
weight: 90
- name: api-v2
port: 80
weight: 10 # 金丝雀发布:10%流量到v2

CNI插件详解

Calico

Calico使用BGP协议实现三层路由,支持高性能的无封装网络:

graph TB
    subgraph Node1["Node 1"]
        PA["Pod A<br>10.244.1.2"]
        BIRD1["BIRD<br>BGP Agent"]
        FW1["Felix<br>iptables/eBPF"]
    end

    subgraph Node2["Node 2"]
        PB["Pod B<br>10.244.2.2"]
        BIRD2["BIRD<br>BGP Agent"]
        FW2["Felix<br>iptables/eBPF"]
    end

    BIRD1 <-->|"BGP Peering"| BIRD2
    PA --> FW1 --> BIRD1
    BIRD2 --> FW2 --> PB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Calico IPPool配置
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: default-pool
spec:
cidr: 10.244.0.0/16
encapsulation: None # 无封装,纯BGP路由
natOutgoing: true
nodeSelector: all()

---
# Calico Network Policy(比K8s原生策略更强大)
apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
name: allow-api-only
namespace: production
spec:
selector: app == 'database'
types:
- Ingress
ingress:
- action: Allow
protocol: TCP
source:
selector: app == 'api-server'
destination:
ports:
- 5432

Flannel

Flannel是最简单的CNI插件,适合小型集群和学习环境:

1
2
3
4
5
6
7
8
9
10
11
12
# Flannel配置
{
"Network": "10.244.0.0/16",
"SubnetLen": 24,
"SubnetMin": "10.244.1.0",
"SubnetMax": "10.244.254.0",
"Backend": {
"Type": "vxlan",
"VNI": 1,
"DirectRouting": true
}
}

Cilium

Cilium基于eBPF技术,提供高性能网络和安全能力:

graph TB
    subgraph CiliumArch["Cilium架构"]
        Agent["Cilium Agent"]
        Operator["Cilium Operator"]
        Hubble["Hubble<br>可观测性"]

        subgraph DataPlane["eBPF数据平面"]
            TC["TC Hook<br>流量控制"]
            XDP["XDP Hook<br>高速路径"]
            Socket["Socket Hook<br>L7策略"]
        end
    end

    Agent --> TC
    Agent --> XDP
    Agent --> Socket
    Hubble --> Agent
    Operator --> Agent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Cilium Network Policy - L7级别策略
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-l7-policy
namespace: default
spec:
endpointSelector:
matchLabels:
app: api-server
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/v1/.*"
- method: "POST"
path: "/api/v1/users"
headers:
- 'Content-Type: application/json'

CNI插件对比

特性 Calico Flannel Cilium
网络模式 BGP/VXLAN/IPIP VXLAN/Host-GW eBPF/VXLAN
Network Policy 完整支持 不支持 L3-L7全支持
性能 很高
加密 WireGuard 不支持 WireGuard/IPsec
可观测性 基本 Hubble
复杂度
适用场景 大规模生产 学习/小集群 高性能/安全要求高

Network Policy

Network Policy是Kubernetes原生的网络安全策略,用于控制Pod间的流量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# 默认拒绝所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress

---
# 允许前端访问后端的特定端口
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
- namespaceSelector:
matchLabels:
env: production
ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 8443

---
# 限制出站流量 - 仅允许访问特定外部IP
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: restrict-egress
namespace: production
spec:
podSelector:
matchLabels:
app: restricted-app
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/8
ports:
- protocol: TCP
port: 5432
- to: # 允许DNS查询
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
graph LR
    subgraph Allowed["允许的流量"]
        FE["Frontend Pod"] -->|":8080"| BE["Backend Pod"]
        BE -->|":5432"| DB["Database Pod"]
    end

    subgraph Denied["拒绝的流量"]
        FE -.->|"X"| DB
        External["External"] -.->|"X"| BE
    end

    style Denied fill:#ffebee
    style Allowed fill:#e8f5e9

故障排查

常见网络问题的排查方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 1. 检查Pod网络连通性
kubectl exec -it debug-pod -- ping 10.244.2.3
kubectl exec -it debug-pod -- curl -v http://web-service:80

# 2. 检查DNS
kubectl exec -it debug-pod -- nslookup kubernetes.default
kubectl exec -it debug-pod -- cat /etc/resolv.conf

# 3. 检查Service Endpoints
kubectl get endpoints web-service
kubectl get endpointslices -l kubernetes.io/service-name=web-service

# 4. 检查kube-proxy规则
# iptables模式
iptables -t nat -L KUBE-SERVICES -n | grep web-service
# IPVS模式
ipvsadm -Ln | grep 10.96.0.100

# 5. 抓包分析
kubectl debug node/node-1 -it --image=nicolaka/netshoot -- tcpdump -i any -nn port 8080

# 6. 检查CNI插件状态
kubectl get pods -n kube-system -l k8s-app=calico-node
calicoctl node status

总结

Kubernetes网络模型通过分层抽象(Pod网络 -> Service网络 -> Ingress),将复杂的网络问题分解为可管理的层次。选择CNI插件时需要考虑集群规模、性能要求、安全需求和运维复杂度。对于生产环境,Calico和Cilium是最常见的选择,前者成熟稳定,后者基于eBPF技术代表了未来方向。

作者 · authorzt
发布 · date2024-05-01
篇幅 · length2.7k 字 · 7 min
许可 · licenseCC BY-SA 4.0
$ echo "comments" · 评论