AI · #embedding#vector-database#milvus

向量数据库对比:Milvus vs Pinecone vs Weaviate

2025.09.17 6 min 2.5k
// 目录 · contents

引言

向量数据库是 AI 应用基础设施的核心组件,支撑着语义搜索、推荐系统、RAG 等关键场景。随着 LLM 的爆发,向量数据库市场迅速增长,Milvus、Pinecone、Weaviate 成为最受关注的三个选手。本文将从 ANN 算法原理、架构设计、索引策略、性能基准、使用场景等维度进行全方位对比,帮助你做出技术选型。

ANN 算法基础

向量数据库的核心是近似最近邻(Approximate Nearest Neighbor, ANN)搜索算法。精确搜索在高维度下复杂度为 O(n),无法满足大规模实时需求。ANN 算法以牺牲少量准确度换取数量级的速度提升。

graph TB
    A[ANN 算法族] --> B[基于图<br/>Graph-Based]
    A --> C[基于量化<br/>Quantization]
    A --> D[基于哈希<br/>Hash-Based]
    A --> E[基于树<br/>Tree-Based]

    B --> B1[HNSW<br/>Hierarchical Navigable<br/>Small World]
    C --> C1[IVF<br/>Inverted File Index]
    C --> C2[PQ<br/>Product Quantization]
    C --> C3[ScaNN]
    D --> D1[LSH<br/>Locality Sensitive Hashing]
    E --> E1[Annoy<br/>Approximate Nearest<br/>Neighbors Oh Yeah]

    style B1 fill:#2ecc71,color:#fff
    style C1 fill:#3498db,color:#fff
    style C2 fill:#e74c3c,color:#fff

HNSW (Hierarchical Navigable Small World)

HNSW 是目前最流行的 ANN 算法,通过构建多层图结构实现高效搜索:

1
2
3
4
5
6
7
8
9
10
# HNSW key parameters
hnsw_params = {
"M": 16, # Max connections per node (affects graph connectivity)
"ef_construction": 200, # Construction-time search width (affects build quality)
"ef_search": 100, # Query-time search width (affects recall vs speed)
}

# Higher M: better recall, more memory, slower build
# Higher ef_construction: better graph quality, slower build
# Higher ef_search: better recall, slower search
graph TB
    subgraph "HNSW Multi-Layer Structure"
        subgraph "Layer 2 (Sparse)"
            L2A((A)) --- L2B((B))
        end

        subgraph "Layer 1 (Medium)"
            L1A((A)) --- L1B((B))
            L1B --- L1C((C))
            L1A --- L1D((D))
        end

        subgraph "Layer 0 (Dense)"
            L0A((A)) --- L0B((B))
            L0B --- L0C((C))
            L0C --- L0D((D))
            L0D --- L0E((E))
            L0A --- L0D
            L0B --- L0E
            L0C --- L0F((F))
        end
    end

    L2A -.-> L1A
    L2B -.-> L1B
    L1A -.-> L0A
    L1B -.-> L0B
    L1C -.-> L0C
    L1D -.-> L0D

IVF (Inverted File Index)

IVF 将向量空间划分为多个 Voronoi 区域(聚类),检索时只搜索最近的几个区域:

1
2
3
4
5
6
7
8
9
# IVF parameters
ivf_params = {
"nlist": 1024, # Number of clusters (partitions)
"nprobe": 16, # Number of clusters to search at query time
}

# nlist: more clusters = finer partitions, slower build
# nprobe: more probes = better recall, slower search
# Rule of thumb: nprobe = sqrt(nlist)

PQ (Product Quantization)

PQ 将高维向量分解为子向量,并用聚类中心编码,大幅减少内存占用:

1
2
3
4
5
6
7
8
9
# PQ compresses 768-dim float32 vector:
# Original: 768 * 4 bytes = 3072 bytes
# With PQ (m=96, nbits=8): 96 * 1 byte = 96 bytes
# Compression ratio: 32x

pq_params = {
"m": 96, # Number of sub-quantizers (must divide dimension)
"nbits": 8, # Bits per sub-quantizer (256 centroids)
}

架构对比

Milvus 架构

graph TB
    subgraph "Milvus 分布式架构"
        A[SDK Client] --> B[Proxy / Load Balancer]
        B --> C[Coord Services]
        C --> C1[Root Coord]
        C --> C2[Query Coord]
        C --> C3[Data Coord]
        C --> C4[Index Coord]

        D[Worker Nodes]
        D --> D1[Query Node<br/>执行搜索]
        D --> D2[Data Node<br/>数据写入]
        D --> D3[Index Node<br/>构建索引]

        E[Storage Layer]
        E --> E1[etcd<br/>元数据]
        E --> E2[MinIO/S3<br/>数据存储]
        E --> E3[Pulsar/Kafka<br/>日志流]
    end

    C2 --> D1
    C3 --> D2
    C4 --> D3
    D1 --> E2
    D2 --> E2
    D2 --> E3

    style B fill:#00a1ea,color:#fff
    style D1 fill:#2ecc71,color:#fff
    style E2 fill:#e74c3c,color:#fff

Milvus 特点: - 存算分离,独立扩展计算和存储 - 支持多种索引类型(HNSW, IVF_FLAT, IVF_PQ, DiskANN) - 原生支持标量过滤 + 向量搜索 - 开源,可私有化部署

Pinecone 架构

graph TB
    subgraph "Pinecone 全托管架构"
        A[API Client] --> B[API Gateway]
        B --> C[Index Service]
        C --> D1[Pod 1<br/>Shard]
        C --> D2[Pod 2<br/>Shard]
        C --> D3[Pod 3<br/>Shard]

        D1 --> E[Replicas<br/>副本]
        D2 --> E
        D3 --> E
    end

    style B fill:#000,color:#fff
    style C fill:#6366f1,color:#fff

Pinecone 特点: - 全托管 SaaS,无需运维 - Serverless 模式,按用量付费 - 自研索引算法(基于 PQ + Graph) - 仅云端部署,不开源

Weaviate 架构

graph TB
    subgraph "Weaviate 架构"
        A[REST/GraphQL API] --> B[Weaviate Core]
        B --> C[Schema Manager]
        B --> D[Vector Index<br/>HNSW]
        B --> E[Object Store<br/>LSM Tree]
        B --> F[Module System]

        F --> F1[text2vec-openai]
        F --> F2[text2vec-transformers]
        F --> F3[generative-openai]
        F --> F4[reranker-cohere]
    end

    style B fill:#3cbe8e,color:#fff
    style F fill:#f39c12,color:#000

Weaviate 特点: - 原生支持多模态(文本、图片、音频) - 模块化 vectorizer(内置 Embedding 生成) - GraphQL API - 开源,支持自托管和云端

使用示例对比

Milvus

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from pymilvus import MilvusClient

# Connect
client = MilvusClient(uri="http://localhost:19530")

# Create collection
client.create_collection(
collection_name="articles",
dimension=768,
metric_type="COSINE",
auto_id=True,
)

# Insert vectors
data = [
{"vector": embedding1, "title": "RAG架构设计", "category": "AI"},
{"vector": embedding2, "title": "微服务最佳实践", "category": "Architecture"},
]
client.insert(collection_name="articles", data=data)

# Search with metadata filter
results = client.search(
collection_name="articles",
data=[query_embedding],
limit=5,
output_fields=["title", "category"],
filter='category == "AI"',
)

for hits in results:
for hit in hits:
print(f"Score: {hit['distance']:.4f}, Title: {hit['entity']['title']}")

Pinecone

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from pinecone import Pinecone

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
name="articles",
dimension=768,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

index = pc.Index("articles")

# Upsert vectors
index.upsert(vectors=[
{
"id": "article-1",
"values": embedding1,
"metadata": {"title": "RAG架构设计", "category": "AI"},
},
{
"id": "article-2",
"values": embedding2,
"metadata": {"title": "微服务最佳实践", "category": "Architecture"},
},
])

# Query with metadata filter
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True,
filter={"category": {"$eq": "AI"}},
)

for match in results["matches"]:
print(f"Score: {match['score']:.4f}, Title: {match['metadata']['title']}")

Weaviate

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import weaviate
from weaviate.classes.config import Property, DataType, Configure

# Connect
client = weaviate.connect_to_local()

# Create collection with built-in vectorizer
collection = client.collections.create(
name="Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="content", data_type=DataType.TEXT),
Property(name="category", data_type=DataType.TEXT),
],
)

# Insert (auto-vectorization!)
articles = client.collections.get("Article")
articles.data.insert({
"title": "RAG架构设计",
"content": "RAG是检索增强生成的缩写...",
"category": "AI",
})

# Hybrid search (vector + keyword)
results = articles.query.hybrid(
query="向量数据库的选型",
limit=5,
alpha=0.7, # 0=pure keyword, 1=pure vector
filters=weaviate.classes.query.Filter.by_property("category").equal("AI"),
)

for obj in results.objects:
print(f"Title: {obj.properties['title']}")

性能对比

指标 Milvus Pinecone Weaviate
1M 向量搜索延迟 (p99) ~5ms ~10ms ~8ms
10M 向量搜索延迟 (p99) ~15ms ~20ms ~20ms
100M 向量支持 分布式扩展 Serverless 自动扩展 需要分片
写入吞吐量
内存效率 高 (支持 DiskANN)
索引构建速度 快 (GPU 加速) N/A (自动)

注:以上数据为典型场景下的参考值,实际性能受硬件、数据分布、参数配置影响。

选型决策矩阵

flowchart TD
    A[向量数据库选型] --> B{部署方式?}
    B -->|全托管 SaaS| C{预算?}
    C -->|充足| D[Pinecone<br/>最省心]
    C -->|有限| E[Weaviate Cloud<br/>免费层]

    B -->|私有化部署| F{数据规模?}
    F -->|< 10M 向量| G[Weaviate<br/>单节点]
    F -->|> 10M 向量| H{技术栈?}
    H -->|需要高性能 + GPU| I[Milvus<br/>分布式]
    H -->|需要内置向量化| J[Weaviate<br/>模块化]

    style D fill:#6366f1,color:#fff
    style I fill:#00a1ea,color:#fff
    style J fill:#3cbe8e,color:#fff

索引策略建议

场景 推荐索引 理由
数据量 < 100K FLAT / Brute Force 小数据量下精确搜索足够快
100K-10M, 高召回需求 HNSW 查询速度和召回率最佳平衡
10M-100M, 内存受限 IVF_PQ 压缩存储,适中的召回率
> 100M, SSD 存储 DiskANN (Milvus) 基于磁盘的索引,内存需求小
实时更新频繁 HNSW 支持动态插入,无需重建索引

生产部署建议

Milvus 部署

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# docker-compose.yml for Milvus standalone
version: '3.5'
services:
etcd:
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
volumes:
- etcd_data:/etcd

minio:
image: minio/minio:latest
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- minio_data:/minio_data
command: minio server /minio_data

milvus:
image: milvusdb/milvus:v2.4-latest
depends_on:
- etcd
- minio
ports:
- "19530:19530"
- "9091:9091"
volumes:
- milvus_data:/var/lib/milvus

volumes:
etcd_data:
minio_data:
milvus_data:

监控指标

1
2
3
4
5
6
7
8
9
10
# Key metrics to monitor
monitoring_metrics = {
"search_latency_p99": "搜索延迟 P99,阈值 < 50ms",
"insert_rate": "写入速率 (vectors/sec)",
"memory_usage": "内存使用率,预警阈值 80%",
"index_build_time": "索引构建耗时",
"recall_rate": "召回率,目标 > 95%",
"query_qps": "查询 QPS",
"collection_row_count": "集合行数增长趋势",
}

总结

三大向量数据库各有定位:

  • Milvus:最适合大规模、高性能需求的私有化部署场景,索引类型丰富,支持 GPU 加速
  • Pinecone:最适合快速原型开发和中小规模生产应用,零运维的全托管体验
  • Weaviate:最适合需要内置向量化和多模态支持的场景,开发者体验优秀

选择时优先考虑:数据规模、部署要求(云/私有化)、是否需要混合搜索、团队运维能力。对于大多数初创项目,Pinecone 的 serverless 模式或 Weaviate Cloud 是最快上手的选择;当数据规模增长到千万级以上,Milvus 的分布式架构将展现出更大优势。

作者 · authorzt
发布 · date2025-09-17
篇幅 · length2.5k 字 · 6 min
许可 · licenseCC BY-SA 4.0
$ echo "comments" · 评论