Architecture · #system-design#url-shortener

系统设计实战:短链接服务架构

2024.04.03 7 min 2.9k
// 目录 · contents

前言

短链接服务(URL Shortener)是系统设计面试中的经典题目,也是实际工程中常见的需求。虽然看似简单——将长 URL 映射为短 URL,但要设计一个支持高并发、高可用、可水平扩展的短链接服务,涉及到的技术点非常丰富。本文将从需求分析开始,逐步设计一个生产级的短链接服务。

需求分析

功能需求

  1. 给定一个长 URL,生成一个短 URL
  2. 用户访问短 URL 时,重定向到原始长 URL
  3. 短链接可以设置过期时间
  4. 自定义短链接别名(可选)
  5. 访问统计分析

非功能需求

  • 高可用:服务可用性 99.99%
  • 低延迟:重定向响应时间 < 50ms
  • 高并发:支持每秒 10 万次读取
  • 高扩展:可水平扩展

容量估算

1
2
3
4
5
6
7
8
9
10
11
假设:
- 每月新增短链接: 1亿条
- 读写比: 100:1
- 每条短链接平均存储: 500 bytes
- 保留期限: 5年

写入 QPS: 1亿 / 30天 / 24小时 / 3600秒 ≈ 40 QPS
读取 QPS: 40 × 100 = 4,000 QPS (峰值 × 10 = 40,000 QPS)

存储: 1亿/月 × 12月 × 5年 × 500B = 3TB
缓存: 热点数据 20% ≈ 600GB (可分片)

系统架构总览

graph TB
    Client[客户端] --> CDN[CDN]
    CDN --> LB[负载均衡]

    LB --> API1[API Server 1]
    LB --> API2[API Server 2]
    LB --> APIN[API Server N]

    API1 --> Cache[Redis Cluster<br/>缓存层]
    API2 --> Cache
    APIN --> Cache

    Cache --> DB[(MySQL Cluster<br/>主从)]

    API1 --> IDGen[ID生成服务]
    API2 --> IDGen
    APIN --> IDGen

    API1 --> Analytics[分析服务]
    Analytics --> Kafka[Kafka]
    Kafka --> Flink[Flink 流处理]
    Flink --> ClickHouse[(ClickHouse<br/>分析数据库)]

URL 编码方案

方案选择

短链接的核心是将长 URL 映射为一个短字符串。常见方案:

方案 优点 缺点
哈希(MD5/SHA) 实现简单 碰撞处理复杂
自增ID + Base62 无碰撞 需要分布式ID
随机字符串 简单 碰撞检测开销
预生成 无实时计算 维护Key池

推荐使用 自增 ID + Base62 编码,既保证唯一性又简洁。

Base62 编码

Base62 使用 [0-9a-zA-Z] 共 62 个字符。7 位 Base62 可以表示 62^7 ≈ 3.5 万亿个不同的值,足以满足需求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public class Base62Encoder {
private static final String ALPHABET =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final int BASE = ALPHABET.length();

public static String encode(long num) {
if (num == 0) return String.valueOf(ALPHABET.charAt(0));

StringBuilder sb = new StringBuilder();
while (num > 0) {
sb.append(ALPHABET.charAt((int) (num % BASE)));
num /= BASE;
}
return sb.reverse().toString();
}

public static long decode(String str) {
long num = 0;
for (char c : str.toCharArray()) {
num = num * BASE + ALPHABET.indexOf(c);
}
return num;
}
}
1
2
3
4
示例:
ID = 12345678 → Base62 = "dGnd"
ID = 1000000000 → Base62 = "15FTGf"
ID = 3521614606208 → Base62 = "zzzzzzzz"

分布式 ID 生成

graph LR
    subgraph 方案1: Snowflake
        S[Snowflake ID<br/>64位] --> T[时间戳 41位]
        S --> M[机器ID 10位]
        S --> SEQ[序列号 12位]
    end

    subgraph 方案2: 号段模式
        DB[(数据库)] -->|每次取1000个ID| Service1[服务1<br/>1-1000]
        DB -->|每次取1000个ID| Service2[服务2<br/>1001-2000]
    end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// 号段模式 ID 生成器
@Service
public class SegmentIdGenerator {
private final AtomicLong currentId = new AtomicLong(0);
private volatile long maxId = 0;
private final int segmentSize = 1000;
private final Object lock = new Object();

@Autowired
private JdbcTemplate jdbc;

public long nextId() {
long id = currentId.incrementAndGet();
if (id > maxId) {
synchronized (lock) {
if (currentId.get() > maxId) {
loadNextSegment();
}
id = currentId.incrementAndGet();
}
}
return id;
}

private void loadNextSegment() {
// 原子性地获取一个号段
jdbc.update(
"UPDATE id_generator SET max_id = max_id + ? WHERE biz_type = 'short_url'",
segmentSize);

Long newMaxId = jdbc.queryForObject(
"SELECT max_id FROM id_generator WHERE biz_type = 'short_url'",
Long.class);

maxId = newMaxId;
currentId.set(newMaxId - segmentSize);
}
}

数据库设计

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- 短链接主表
CREATE TABLE short_url (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(10) NOT NULL,
long_url VARCHAR(2048) NOT NULL,
user_id BIGINT,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NULL,
click_count BIGINT NOT NULL DEFAULT 0,

UNIQUE KEY uk_short_code (short_code),
INDEX idx_user_id (user_id),
INDEX idx_expires_at (expires_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- 去重表 (用于相同长URL返回同一短链接)
CREATE TABLE url_mapping (
url_hash CHAR(64) NOT NULL, -- SHA-256 of long_url
short_code VARCHAR(10) NOT NULL,
long_url VARCHAR(2048) NOT NULL,

PRIMARY KEY (url_hash),
INDEX idx_short_code (short_code)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- 访问日志表 (按天分区)
CREATE TABLE click_log (
id BIGINT AUTO_INCREMENT,
short_code VARCHAR(10) NOT NULL,
client_ip VARCHAR(45),
user_agent VARCHAR(512),
referer VARCHAR(2048),
country VARCHAR(2),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

PRIMARY KEY (id, created_at),
INDEX idx_short_code (short_code)
) PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
PARTITION p202501 VALUES LESS THAN (UNIX_TIMESTAMP('2025-02-01')),
PARTITION p202502 VALUES LESS THAN (UNIX_TIMESTAMP('2025-03-01'))
);

核心 API 实现

创建短链接

sequenceDiagram
    participant Client
    participant API as API Server
    participant IDGen as ID Generator
    participant Cache as Redis
    participant DB as MySQL

    Client->>API: POST /api/v1/shorten<br/>{long_url, expires_at}

    API->>API: 校验URL格式
    API->>DB: 查询url_mapping<br/>(SHA256去重)

    alt URL已存在
        DB-->>API: 返回已有short_code
    else URL不存在
        API->>IDGen: 获取唯一ID
        IDGen-->>API: id=12345678
        API->>API: Base62编码<br/>id→"dGnd"
        API->>DB: INSERT short_url + url_mapping
        API->>Cache: SET short:dGnd → long_url
    end

    API-->>Client: {short_url: "https://s.io/dGnd"}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
@RestController
@RequestMapping("/api/v1")
public class ShortUrlController {
@Autowired
private ShortUrlService shortUrlService;

@PostMapping("/shorten")
public ResponseEntity<ShortenResponse> shorten(@Valid @RequestBody ShortenRequest request) {
String shortCode = shortUrlService.createShortUrl(
request.getLongUrl(),
request.getExpiresAt(),
request.getCustomAlias()
);

return ResponseEntity.ok(new ShortenResponse(
"https://s.io/" + shortCode,
shortCode
));
}
}

@Service
public class ShortUrlService {
@Autowired
private SegmentIdGenerator idGenerator;
@Autowired
private ShortUrlMapper urlMapper;
@Autowired
private RedisTemplate<String, String> redis;

@Transactional
public String createShortUrl(String longUrl, LocalDateTime expiresAt, String customAlias) {
// 1. URL 格式校验
validateUrl(longUrl);

// 2. 自定义别名
if (customAlias != null && !customAlias.isEmpty()) {
if (urlMapper.existsByShortCode(customAlias)) {
throw new AliasAlreadyExistsException(customAlias);
}
saveShortUrl(customAlias, longUrl, expiresAt);
return customAlias;
}

// 3. 检查是否已有映射(去重)
String urlHash = DigestUtils.sha256Hex(longUrl);
String existingCode = urlMapper.findShortCodeByUrlHash(urlHash);
if (existingCode != null) {
return existingCode;
}

// 4. 生成短码
long id = idGenerator.nextId();
String shortCode = Base62Encoder.encode(id);

// 5. 持久化
saveShortUrl(shortCode, longUrl, expiresAt);
urlMapper.insertUrlMapping(urlHash, shortCode, longUrl);

// 6. 写入缓存
redis.opsForValue().set("short:" + shortCode, longUrl,
Duration.ofHours(24));

return shortCode;
}

private void saveShortUrl(String shortCode, String longUrl, LocalDateTime expiresAt) {
ShortUrl entity = new ShortUrl();
entity.setShortCode(shortCode);
entity.setLongUrl(longUrl);
entity.setExpiresAt(expiresAt);
urlMapper.insert(entity);
}
}

重定向处理

sequenceDiagram
    participant Client
    participant API as API Server
    participant Cache as Redis
    participant DB as MySQL
    participant Kafka as Kafka

    Client->>API: GET /dGnd
    API->>Cache: GET short:dGnd

    alt 缓存命中
        Cache-->>API: long_url
    else 缓存未命中
        API->>DB: SELECT long_url WHERE short_code='dGnd'
        DB-->>API: long_url
        API->>Cache: SET short:dGnd → long_url
    end

    API->>API: 检查是否过期
    API-->>Client: 301/302 Redirect → long_url
    API->>Kafka: 异步发送点击事件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@RestController
public class RedirectController {

@GetMapping("/{shortCode:[a-zA-Z0-9]{4,10}}")
public ResponseEntity<Void> redirect(
@PathVariable String shortCode,
HttpServletRequest request) {

// 1. 查缓存
String longUrl = redis.opsForValue().get("short:" + shortCode);

if (longUrl == null) {
// 2. 查数据库
ShortUrl shortUrl = urlMapper.findByShortCode(shortCode);
if (shortUrl == null) {
throw new ShortUrlNotFoundException(shortCode);
}

// 3. 检查过期
if (shortUrl.getExpiresAt() != null &&
shortUrl.getExpiresAt().isBefore(LocalDateTime.now())) {
throw new ShortUrlExpiredException(shortCode);
}

longUrl = shortUrl.getLongUrl();

// 4. 回填缓存
redis.opsForValue().set("short:" + shortCode, longUrl,
Duration.ofHours(24));
}

// 5. 异步记录点击事件
publishClickEvent(shortCode, request);

// 6. 302 重定向
return ResponseEntity.status(HttpStatus.FOUND)
.location(URI.create(longUrl))
.build();
}

private void publishClickEvent(String shortCode, HttpServletRequest request) {
ClickEvent event = new ClickEvent();
event.setShortCode(shortCode);
event.setClientIp(getClientIp(request));
event.setUserAgent(request.getHeader("User-Agent"));
event.setReferer(request.getHeader("Referer"));
event.setTimestamp(Instant.now());

kafkaTemplate.send("click-events", shortCode, event);
}
}

301 vs 302:301 是永久重定向,浏览器会缓存,后续不再请求服务器;302 是临时重定向,每次都经过服务器。如果需要统计点击量,应使用 302。

访问分析

graph LR
    API[API Server] -->|点击事件| Kafka[Kafka]
    Kafka --> Flink[Flink]

    Flink -->|实时聚合| Redis[Redis<br/>实时计数]
    Flink -->|批量写入| CH[(ClickHouse<br/>分析查询)]

    Dashboard[分析面板] --> Redis
    Dashboard --> CH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Flink 实时统计处理
public class ClickAnalyticsJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<ClickEvent> clicks = env
.addSource(new FlinkKafkaConsumer<>("click-events",
new ClickEventSchema(), kafkaProps));

// 每分钟统计各短链接的点击量
clicks
.keyBy(ClickEvent::getShortCode)
.window(TumblingProcessingTimeWindows.of(Time.minutes(1)))
.aggregate(new ClickCountAggregator())
.addSink(new ClickHouseSink());

// 实时更新 Redis 计数器
clicks
.keyBy(ClickEvent::getShortCode)
.process(new ProcessFunction<ClickEvent, Void>() {
@Override
public void processElement(ClickEvent event, Context ctx, Collector<Void> out) {
redisClient.incr("clicks:" + event.getShortCode());
redisClient.pfAdd("uv:" + event.getShortCode(), event.getClientIp());
}
});

env.execute("Click Analytics");
}
}

高可用与扩展

缓存策略

graph TB
    subgraph 缓存层设计
        Hot[热点短链接<br/>本地缓存 Caffeine] --> Warm[活跃短链接<br/>Redis Cluster]
        Warm --> Cold[冷数据<br/>MySQL]
    end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// 多级缓存
@Component
public class ShortUrlCache {
// L1: 本地缓存 (热点数据)
private final Cache<String, String> localCache = Caffeine.newBuilder()
.maximumSize(100_000)
.expireAfterAccess(10, TimeUnit.MINUTES)
.build();

@Autowired
private RedisTemplate<String, String> redis;

public String getLongUrl(String shortCode) {
// L1
String url = localCache.getIfPresent(shortCode);
if (url != null) return url;

// L2
url = redis.opsForValue().get("short:" + shortCode);
if (url != null) {
localCache.put(shortCode, url);
return url;
}

return null; // 需要查数据库
}
}

数据库分片

graph TB
    Router[分片路由] --> S0[Shard 0<br/>short_code hash % 4 = 0]
    Router --> S1[Shard 1<br/>short_code hash % 4 = 1]
    Router --> S2[Shard 2<br/>short_code hash % 4 = 2]
    Router --> S3[Shard 3<br/>short_code hash % 4 = 3]

    S0 --> S0M[Master]
    S0 --> S0S[Slave]
    S1 --> S1M[Master]
    S1 --> S1S[Slave]

完整架构图

graph TB
    DNS[DNS] --> CDN[CDN / 边缘节点]
    CDN --> LB[负载均衡 L7]

    subgraph API Layer
        LB --> API1[API Server]
        LB --> API2[API Server]
        LB --> APIN[API Server]
    end

    subgraph Cache Layer
        API1 --> LocalCache1[Caffeine L1]
        LocalCache1 --> RedisCluster[Redis Cluster L2]
    end

    subgraph Storage Layer
        RedisCluster --> Shard1[MySQL Shard 1<br/>Master + Slave]
        RedisCluster --> Shard2[MySQL Shard 2<br/>Master + Slave]
    end

    subgraph ID Generation
        API1 --> Leaf[Leaf ID Generator<br/>号段模式]
    end

    subgraph Analytics Pipeline
        API1 -->|异步| Kafka[Kafka]
        Kafka --> Flink[Flink]
        Flink --> ClickHouse[(ClickHouse)]
        Flink --> RedisCounter[Redis 计数器]
    end

安全考虑

  1. 防恶意 URL:创建短链接时检查长 URL 是否在黑名单中(恶意网站、钓鱼网站)
  2. 限流:对创建接口进行用户级别的限流
  3. 防滥用:限制单个用户的创建频率和数量
  4. 隐私:访问日志中的 IP 地址需要脱敏处理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// URL 安全检查
public class UrlSafetyChecker {
private final Set<String> blacklistedDomains;

public boolean isSafe(String url) {
URI uri = URI.create(url);
String host = uri.getHost().toLowerCase();

// 黑名单检查
if (blacklistedDomains.contains(host)) {
return false;
}

// 防止内网地址
InetAddress addr = InetAddress.getByName(host);
if (addr.isLoopbackAddress() || addr.isSiteLocalAddress()) {
return false;
}

return true;
}
}

总结

短链接服务虽然功能看似简单,但涉及到的技术点覆盖了系统设计的方方面面:分布式 ID 生成保证唯一性,Base62 编码保证短码简洁,多级缓存保证低延迟,数据库分片保证存储扩展,消息队列实现异步统计。关键设计决策包括:使用 302 而非 301 来保证统计准确性,使用号段模式生成分布式 ID,采用 Caffeine + Redis 多级缓存将热点 Key 的响应时间压到微秒级。

作者 · authorzt
发布 · date2024-04-03
篇幅 · length2.9k 字 · 7 min
许可 · licenseCC BY-SA 4.0
$ echo "comments" · 评论