缓存策略深度解析：穿透、击穿与雪崩的防护

前言

缓存是提升系统性能最直接有效的手段。合理使用缓存可以将响应时间从毫秒级降低到微秒级，将数据库的 QPS 降低几个数量级。但缓存的使用远不止”读缓存，没有就查库再写缓存”这么简单——缓存穿透、击穿、雪崩是生产环境中的常见问题，处理不好可能导致整个系统瘫痪。本文将系统性地梳理缓存策略和常见问题的解决方案。

缓存读写策略

Cache-Aside（旁路缓存）

最经典的缓存模式，应用代码同时管理缓存和数据库。

sequenceDiagram
    participant App as 应用
    participant Cache as Redis
    participant DB as 数据库

    Note over App: 读操作
    App->>Cache: 1. GET key
    alt 缓存命中
        Cache-->>App: 返回数据
    else 缓存未命中
        Cache-->>App: nil
        App->>DB: 2. 查询数据库
        DB-->>App: 返回数据
        App->>Cache: 3. SET key value
    end

    Note over App: 写操作
    App->>DB: 1. 更新数据库
    App->>Cache: 2. 删除缓存

@Service
public class ProductService {
    @Autowired
    private RedisTemplate<String, Product> redis;
    @Autowired
    private ProductMapper productMapper;

    // 读操作
    public Product getProduct(String productId) {
        String cacheKey = "product:" + productId;

        // 1. 查缓存
        Product product = redis.opsForValue().get(cacheKey);
        if (product != null) {
            return product;
        }

        // 2. 缓存未命中，查数据库
        product = productMapper.selectById(productId);
        if (product != null) {
            // 3. 写入缓存，设置随机过期时间
            long ttl = 3600 + ThreadLocalRandom.current().nextInt(600);
            redis.opsForValue().set(cacheKey, product, ttl, TimeUnit.SECONDS);
        }

        return product;
    }

    // 写操作
    @Transactional
    public void updateProduct(Product product) {
        // 1. 更新数据库
        productMapper.updateById(product);
        // 2. 删除缓存
        redis.delete("product:" + product.getId());
    }
}

为什么是”先更新数据库，再删除缓存”而不是”先删除缓存，再更新数据库”？

sequenceDiagram
    participant A as 请求A(写)
    participant B as 请求B(读)
    participant Cache as 缓存
    participant DB as 数据库

    Note over A,DB: 场景: 先删缓存再更新DB(有问题!)

    A->>Cache: 1. 删除缓存
    B->>Cache: 2. 读缓存(未命中)
    B->>DB: 3. 读数据库(旧值V1)
    A->>DB: 4. 更新数据库(V1→V2)
    B->>Cache: 5. 写入缓存(旧值V1)

    Note over Cache: 缓存中是旧值V1<br/>数据库是新值V2<br/>数据不一致!

Read-Through / Write-Through

应用只和缓存交互，由缓存层负责与数据库的读写同步。

graph LR
    subgraph Read-Through
        App1[应用] -->|读| Cache1[缓存层]
        Cache1 -->|未命中时自动加载| DB1[数据库]
    end

    subgraph Write-Through
        App2[应用] -->|写| Cache2[缓存层]
        Cache2 -->|同步写入| DB2[数据库]
    end

// 使用 Spring Cache 实现 Read-Through
@Service
public class ProductService {

    @Cacheable(value = "products", key = "#productId",
               unless = "#result == null")
    public Product getProduct(String productId) {
        // 缓存未命中时自动调用此方法
        return productMapper.selectById(productId);
    }

    @CachePut(value = "products", key = "#product.id")
    @Transactional
    public Product updateProduct(Product product) {
        productMapper.updateById(product);
        return product; // 返回值写入缓存
    }

    @CacheEvict(value = "products", key = "#productId")
    @Transactional
    public void deleteProduct(String productId) {
        productMapper.deleteById(productId);
    }
}

// Spring Cache 配置
@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory factory) {
        RedisCacheConfiguration config = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(60))
            .serializeKeysWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new GenericJackson2JsonRedisSerializer()));

        return RedisCacheManager.builder(factory)
            .cacheDefaults(config)
            .withCacheConfiguration("products",
                config.entryTtl(Duration.ofMinutes(30)))
            .build();
    }
}

Write-Behind（异步写回）

写操作先更新缓存，异步批量写入数据库。适合写密集型场景。

sequenceDiagram
    participant App as 应用
    participant Cache as 缓存
    participant Queue as 异步队列
    participant DB as 数据库

    App->>Cache: 1. 更新缓存
    Cache-->>App: OK(立即返回)
    Cache->>Queue: 2. 写入变更队列

    loop 定时批量写入
        Queue->>DB: 3. 批量写入数据库
    end

// Write-Behind 实现示例
@Component
public class WriteBehindCache {
    private final RedisTemplate<String, Object> redis;
    private final BlockingQueue<WriteTask> writeQueue = new LinkedBlockingQueue<>(10000);
    private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(2);

    @PostConstruct
    public void init() {
        // 每500ms批量写入一次
        scheduler.scheduleWithFixedDelay(this::flushWriteQueue,
            500, 500, TimeUnit.MILLISECONDS);
    }

    public void put(String key, Object value) {
        redis.opsForValue().set(key, value);
        writeQueue.offer(new WriteTask(key, value, WriteType.UPDATE));
    }

    private void flushWriteQueue() {
        List<WriteTask> batch = new ArrayList<>();
        writeQueue.drainTo(batch, 200); // 每次最多处理200条

        if (batch.isEmpty()) return;

        // 合并同key的操作，只保留最后一次
        Map<String, WriteTask> merged = new LinkedHashMap<>();
        for (WriteTask task : batch) {
            merged.put(task.getKey(), task);
        }

        // 批量写入数据库
        try {
            dataMapper.batchUpsert(merged.values());
        } catch (Exception e) {
            log.error("Batch write failed, re-enqueue", e);
            merged.values().forEach(writeQueue::offer);
        }
    }
}

缓存穿透

问题描述

查询一个不存在的数据，缓存中没有，数据库中也没有。每次请求都会穿透缓存直达数据库。

graph LR
    Attacker[攻击者] -->|大量请求<br/>不存在的ID| Cache[缓存<br/>全部MISS]
    Cache -->|所有请求| DB[数据库<br/>全部查不到]
    DB -->|压力暴增| Crash[数据库崩溃]

解决方案 1：缓存空值

public Product getProduct(String productId) {
    String cacheKey = "product:" + productId;

    // 查缓存（包括空值标记）
    ValueWrapper cached = redis.opsForValue().get(cacheKey);
    if (cached != null) {
        Object value = cached;
        if ("NULL".equals(value)) {
            return null; // 缓存的空值
        }
        return (Product) value;
    }

    Product product = productMapper.selectById(productId);
    if (product != null) {
        redis.opsForValue().set(cacheKey, product, 3600, TimeUnit.SECONDS);
    } else {
        // 缓存空值，较短过期时间
        redis.opsForValue().set(cacheKey, "NULL", 300, TimeUnit.SECONDS);
    }
    return product;
}

解决方案 2：布隆过滤器

布隆过滤器（Bloom Filter）可以快速判断一个元素是否一定不存在。

graph LR
    Request[请求] --> BF{布隆过滤器<br/>存在?}
    BF -->|一定不存在| Reject[直接返回null]
    BF -->|可能存在| Cache[查缓存]
    Cache -->|命中| Return[返回数据]
    Cache -->|未命中| DB[查数据库]

@Component
public class ProductCacheWithBloomFilter {
    private BloomFilter<String> bloomFilter;

    @PostConstruct
    public void initBloomFilter() {
        // 预期100万条数据，误判率0.01%
        bloomFilter = BloomFilter.create(
            Funnels.stringFunnel(StandardCharsets.UTF_8),
            1_000_000, 0.0001);

        // 加载所有已有商品ID
        List<String> allProductIds = productMapper.selectAllIds();
        allProductIds.forEach(bloomFilter::put);
    }

    public Product getProduct(String productId) {
        // 布隆过滤器快速判断
        if (!bloomFilter.mightContain(productId)) {
            return null; // 一定不存在
        }

        String cacheKey = "product:" + productId;
        Product product = redis.opsForValue().get(cacheKey);
        if (product != null) {
            return product;
        }

        product = productMapper.selectById(productId);
        if (product != null) {
            redis.opsForValue().set(cacheKey, product, 3600, TimeUnit.SECONDS);
        }
        return product;
    }

    // 新增商品时同步更新布隆过滤器
    public void addProduct(Product product) {
        productMapper.insert(product);
        bloomFilter.put(product.getId());
    }
}

Redis 也提供了布隆过滤器模块（RedisBloom）：

# Redis Bloom Filter 命令
BF.ADD product_bloom product_001
BF.ADD product_bloom product_002
BF.EXISTS product_bloom product_001  # 返回 1
BF.EXISTS product_bloom product_999  # 返回 0 (一定不存在)

缓存击穿

问题描述

某个热点 Key 恰好在高并发时过期，大量请求同时穿透到数据库。

graph LR
    subgraph "缓存击穿"
        R1[请求1] --> Cache[缓存<br/>Hot Key刚过期]
        R2[请求2] --> Cache
        R3[请求3] --> Cache
        RN[请求N...] --> Cache
        Cache -->|全部MISS| DB[数据库<br/>瞬间大量查询]
    end

解决方案 1：互斥锁（Mutex）

public Product getProductWithMutex(String productId) {
    String cacheKey = "product:" + productId;
    String lockKey = "lock:product:" + productId;

    Product product = redis.opsForValue().get(cacheKey);
    if (product != null) {
        return product;
    }

    // 尝试获取分布式锁
    boolean locked = redis.opsForValue()
        .setIfAbsent(lockKey, "1", 10, TimeUnit.SECONDS);

    if (locked) {
        try {
            // 双重检查
            product = redis.opsForValue().get(cacheKey);
            if (product != null) {
                return product;
            }

            // 查数据库
            product = productMapper.selectById(productId);
            if (product != null) {
                redis.opsForValue().set(cacheKey, product, 3600, TimeUnit.SECONDS);
            }
            return product;
        } finally {
            redis.delete(lockKey);
        }
    } else {
        // 未获得锁，短暂等待后重试
        try {
            Thread.sleep(50);
        } catch (InterruptedException ignored) {}
        return getProductWithMutex(productId);
    }
}

解决方案 2：逻辑过期

缓存永不过期，但在数据中存储逻辑过期时间。发现逻辑过期后异步更新缓存。

@Data
public class CacheData<T> {
    private T data;
    private long expireAt; // 逻辑过期时间戳

    public boolean isExpired() {
        return System.currentTimeMillis() > expireAt;
    }
}

public Product getProductWithLogicalExpire(String productId) {
    String cacheKey = "product:" + productId;

    CacheData<Product> cached = redis.opsForValue().get(cacheKey);
    if (cached == null) {
        // 缓存中完全没有，说明数据不存在或未初始化
        return null;
    }

    // 未过期，直接返回
    if (!cached.isExpired()) {
        return cached.getData();
    }

    // 已逻辑过期，尝试异步更新
    String lockKey = "lock:product:" + productId;
    boolean locked = redis.opsForValue()
        .setIfAbsent(lockKey, "1", 10, TimeUnit.SECONDS);

    if (locked) {
        // 获得锁，异步更新缓存
        CompletableFuture.runAsync(() -> {
            try {
                Product fresh = productMapper.selectById(productId);
                CacheData<Product> newCache = new CacheData<>();
                newCache.setData(fresh);
                newCache.setExpireAt(System.currentTimeMillis() + 3600_000);
                redis.opsForValue().set(cacheKey, newCache);
            } finally {
                redis.delete(lockKey);
            }
        });
    }

    // 返回旧数据（可能稍有延迟但不会阻塞）
    return cached.getData();
}

缓存雪崩

问题描述

大量缓存同时过期，或者缓存服务宕机，导致所有请求涌向数据库。

graph TB
    subgraph "缓存雪崩场景"
        Normal[正常情况<br/>缓存挡住99%请求]
        Avalanche[雪崩<br/>缓存大面积失效]
        Normal --> |大量Key同时过期<br/>或Redis宕机| Avalanche
        Avalanche --> |100%流量| DB[数据库崩溃]
    end

解决方案

graph TB
    subgraph 防护措施
        A[随机过期时间<br/>避免同时过期] --> Goal[防雪崩]
        B[多级缓存<br/>L1本地+L2 Redis] --> Goal
        C[集群部署<br/>Redis Cluster/Sentinel] --> Goal
        D[限流降级<br/>保护数据库] --> Goal
        E[缓存预热<br/>启动时加载] --> Goal
    end

1. 随机过期时间

// 在基础TTL上加随机偏移
public void cacheProduct(String key, Product product) {
    long baseTtl = 3600; // 基础1小时
    long randomOffset = ThreadLocalRandom.current().nextLong(0, 600); // 随机0-10分钟
    redis.opsForValue().set(key, product, baseTtl + randomOffset, TimeUnit.SECONDS);
}

2. 多级缓存

@Component
public class MultiLevelCache {
    // L1: 本地缓存 (Caffeine)
    private final Cache<String, Object> localCache = Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(5, TimeUnit.MINUTES)
        .build();

    // L2: 分布式缓存 (Redis)
    @Autowired
    private RedisTemplate<String, Object> redis;

    public Object get(String key, Function<String, Object> loader) {
        // L1 查询
        Object value = localCache.getIfPresent(key);
        if (value != null) {
            return value;
        }

        // L2 查询
        value = redis.opsForValue().get(key);
        if (value != null) {
            localCache.put(key, value);
            return value;
        }

        // 数据库查询
        value = loader.apply(key);
        if (value != null) {
            redis.opsForValue().set(key, value, randomTtl(), TimeUnit.SECONDS);
            localCache.put(key, value);
        }
        return value;
    }

    private long randomTtl() {
        return 3600 + ThreadLocalRandom.current().nextLong(600);
    }
}

graph LR
    Request[请求] --> L1[L1: 本地缓存<br/>Caffeine<br/>5min TTL]
    L1 -->|MISS| L2[L2: 分布式缓存<br/>Redis<br/>1h TTL]
    L2 -->|MISS| DB[数据库]
    DB -->|回填| L2
    L2 -->|回填| L1

3. 缓存预热

@Component
public class CacheWarmer implements ApplicationRunner {

    @Override
    public void run(ApplicationArguments args) {
        log.info("Starting cache warm-up...");

        // 加载热门商品
        List<Product> hotProducts = productMapper.selectHotProducts(1000);
        for (Product product : hotProducts) {
            String key = "product:" + product.getId();
            long ttl = 3600 + ThreadLocalRandom.current().nextLong(600);
            redis.opsForValue().set(key, product, ttl, TimeUnit.SECONDS);
        }

        // 加载配置数据
        List<Config> configs = configMapper.selectAll();
        for (Config config : configs) {
            redis.opsForValue().set("config:" + config.getKey(),
                config.getValue(), 86400, TimeUnit.SECONDS);
        }

        log.info("Cache warm-up completed. Products={}, Configs={}",
            hotProducts.size(), configs.size());
    }
}

缓存一致性策略

延迟双删

public void updateProductWithDoubleDelete(Product product) {
    String cacheKey = "product:" + product.getId();

    // 1. 删除缓存
    redis.delete(cacheKey);

    // 2. 更新数据库
    productMapper.updateById(product);

    // 3. 延迟再删一次缓存（异步）
    CompletableFuture.runAsync(() -> {
        try {
            Thread.sleep(500); // 等待主从同步
        } catch (InterruptedException ignored) {}
        redis.delete(cacheKey);
    });
}

基于 Binlog 的缓存更新

graph LR
    App[应用] -->|更新| DB[MySQL]
    DB -->|Binlog| Canal[Canal]
    Canal -->|解析变更| MQ[消息队列]
    MQ -->|消费| CacheUpdater[缓存更新服务]
    CacheUpdater -->|删除/更新| Redis[Redis]

// Canal 监听 Binlog 变更
@Component
public class CanalEventHandler {

    @CanalEventListener
    public void onEvent(CanalEntry.Entry entry) {
        if (entry.getEntryType() != CanalEntry.EntryType.ROWDATA) {
            return;
        }

        CanalEntry.RowChange rowChange = CanalEntry.RowChange
            .parseFrom(entry.getStoreValue());

        String tableName = entry.getHeader().getTableName();

        for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
            if ("product".equals(tableName)) {
                String productId = getColumnValue(rowData, "id");
                // 删除对应缓存
                redis.delete("product:" + productId);
                log.info("Cache invalidated for product: {}", productId);
            }
        }
    }
}

缓存策略选型

场景	推荐策略	原因
读多写少	Cache-Aside	简单可靠
读写均衡	Read/Write-Through	代码简洁
写密集	Write-Behind	减少数据库压力
强一致性	Binlog + 缓存失效	数据库变更驱动
防穿透	布隆过滤器 + 空值缓存	双重保障
防击穿	互斥锁 / 逻辑过期	按场景选择
防雪崩	随机TTL + 多级缓存	多维防护

总结

缓存策略的选择取决于业务场景的读写比例、一致性要求和性能目标。在实际项目中，通常需要组合使用多种策略：Cache-Aside 作为基础模式，布隆过滤器防穿透，互斥锁防击穿，随机 TTL + 多级缓存防雪崩。对于一致性要求高的场景，推荐基于 Canal 监听 Binlog 来异步更新缓存。最重要的是做好监控——监控缓存命中率、穿透率和 Redis 内存使用情况，在问题发生前及时预警。

踩坑记录

春节大促，凌晨零点活动开始，Redis 缓存大规模过期，DB 直接被打挂。

起因是：我们给活动商品缓存统一设置了 TTL=3600s，所有商品在同一时刻批量导入 Redis。零点活动开始后整整 1 小时，所有缓存同时过期，2000 并发直接穿透到 MySQL。DB CPU 飙到 100%，订单服务开始大量超时，持续了 6 分钟才靠限流恢复。

事后分析：典型缓存雪崩，完全是可预防的。改造方案：TTL = 3600 + random(0, 600)，加互斥锁控制缓存重建并发（只允许一个线程回源，其他线程等待或返回旧值），同时对 TOP 100 热点商品做永久预热。之后经历了 3 次大促，DB QPS 峰值始终控制在日常的 1.5 倍以内。

实测结果

阶段	DB QPS 峰值	DB CPU	接口 P99
雪崩发生时	18,000（平时 800）	持续 100% 约 6 分钟	8,000ms+
改造后首次大促	1,200	峰值 35%	45ms
改造后第三次大促	980	峰值 28%	38ms

缓存命中率从雪崩前的 92% 提升到改造后的 99.3%。

我的看法

我见过很多团队只防「穿透」，因为布隆过滤器方案网上资料最多，实现也最有”技术感”。但真正打垮系统的往往是「雪崩」——批量缓存同时过期这个场景太容易被忽略，尤其是定时任务统一刷新缓存的系统必须加随机抖动。

防雪崩的成本极低（一行 random 代码），收益极高。这种投入产出比最高的事应该优先做，而不是上来就搞多级缓存这种复杂方案。