sequenceDiagram
participant N1 as Node 1
participant N2 as Node 2
participant N3 as Node 3
Note over N1, N3: 每秒随机选择节点发送PING
N1->>N2: PING (携带自身信息 + 随机选择的其他节点信息)
N2->>N1: PONG (携带自身信息 + 随机选择的其他节点信息)
Note over N1: 更新N2的状态信息
Note over N2: 更新N1的状态信息
N2->>N3: PING
N3->>N2: PONG
Note over N1, N3: 信息逐渐传播到所有节点
消息类型
消息
用途
PING
探测对方是否在线,携带集群状态信息
PONG
回复PING,同样携带集群状态信息
MEET
邀请新节点加入集群
FAIL
广播某节点已确认下线
PUBLISH
向集群中所有节点广播消息
UPDATE
通知其他节点更新slot-node映射
PFAIL 与 FAIL 的区别
flowchart TD
A[Node A 向 Node B 发送PING] --> B{Node B 响应?}
B -->|是| C[正常]
B -->|否,超过 cluster-node-timeout| D[Node A 标记 Node B 为 PFAIL<br/>主观下线]
D --> E[Node A 通过 Gossip 传播 PFAIL 信息]
E --> F{集群中过半Master<br/>标记 Node B 为 PFAIL?}
F -->|是| G[Node B 被标记为 FAIL<br/>客观下线]
F -->|否| H[继续等待更多节点确认]
G --> I[广播 FAIL 消息到所有节点]
I --> J[触发故障转移流程]
故障转移(Failover)
当Master节点被标记为FAIL后,其Slave节点会自动发起故障转移。
故障转移流程
sequenceDiagram
participant M as Master (故障)
participant S1 as Slave 1 (数据最新)
participant S2 as Slave 2
participant Others as 其他Master节点
Note over M: Master宕机
Others->>Others: 检测到Master FAIL
Note over S1, S2: Slave发起选举
S1->>Others: FAILOVER_AUTH_REQUEST (请求投票)
S2->>Others: FAILOVER_AUTH_REQUEST (请求投票)
Note over Others: 每个Master只能投一票<br/>优先投给数据最新的Slave
Others->>S1: FAILOVER_AUTH_ACK (投票给S1)
Note over S1: 获得过半Master投票
S1->>S1: 1. 执行 SLAVEOF NO ONE
S1->>S1: 2. 接管Master的所有Slot
S1->>Others: 3. 广播新的配置纪元
S1->>S2: 4. S2成为新Master(S1)的Slave
Note over S1: 故障转移完成,S1成为新Master
# 客户端连接到Node1,但key属于Node2负责的slot redis-cli -c -h node1 -p 7000 > SET user:5000 "Alice" -> Redirected to slot [4092] located at 192.168.1.102:7001 OK
sequenceDiagram
participant C as Client
participant N1 as Node 1
participant N2 as Node 2
C->>N1: SET user:5000 "Alice"
N1->>C: MOVED 4092 192.168.1.102:7001
Note over C: 更新本地slot映射缓存
C->>N2: SET user:5000 "Alice"
N2->>C: OK
ASK 重定向
ASK重定向发生在slot迁移过程中,表示key可能正在迁移到目标节点:
sequenceDiagram
participant C as Client
participant Src as Source Node
participant Dst as Destination Node
Note over Src, Dst: Slot迁移进行中
C->>Src: GET key1
Note over Src: key1已迁移到Dst
Src->>C: ASK 4092 192.168.1.102:7001
Note over C: ASK是一次性重定向,不更新缓存
C->>Dst: ASKING
Dst->>C: OK
C->>Dst: GET key1
Dst->>C: "value1"
MOVED vs ASK 的区别: -
MOVED:slot已永久移动到新节点,客户端应更新缓存 -
ASK:slot正在迁移中,仅本次请求重定向到目标节点,不更新缓存
# 第二步: 为新Master迁移slot redis-cli --cluster reshard 192.168.1.101:7000 # 交互式输入: # How many slots do you want to move? 4096 # What is the receiving node ID? <新节点ID> # Source node: all (从所有现有Master均匀迁移)
flowchart LR
subgraph 扩容前
A["M1: 0-5460<br/>(5461 slots)"]
B["M2: 5461-10922<br/>(5462 slots)"]
C["M3: 10923-16383<br/>(5461 slots)"]
end
subgraph 扩容后
D["M1: 0-4095<br/>(4096 slots)"]
E["M2: 5461-9556<br/>(4096 slots)"]
F["M3: 10923-15018<br/>(4096 slots)"]
G["M4: 4096-5460<br/>9557-10922<br/>15019-16383<br/>(4096 slots)"]
end
A --> D
B --> E
C --> F
A -.->|迁移| G
B -.->|迁移| G
C -.->|迁移| G