AI · #ai-agent#llm#planning

AI Agent架构设计与实现模式

2025.10.15 8 min 3.2k
// 目录 · contents

引言

AI Agent 是当前 LLM 应用最激动人心的方向——让模型从被动回答问题,进化为主动规划、使用工具、持续学习的自主体。从 ChatGPT Plugins 到 AutoGPT,从 LangGraph 到 CrewAI,Agent 生态正在快速演变。本文将系统探讨 Agent 的核心架构——ReAct 模式、规划策略、工具使用、记忆管理以及多 Agent 协作系统的设计与实现。

Agent 核心架构

graph TB
    A[AI Agent] --> B[感知 Perception]
    A --> C[推理 Reasoning]
    A --> D[行动 Action]
    A --> E[记忆 Memory]

    B --> B1[用户输入]
    B --> B2[环境观察]
    B --> B3[工具返回值]

    C --> C1[任务分解]
    C --> C2[计划生成]
    C --> C3[决策选择]

    D --> D1[调用工具]
    D --> D2[生成回复]
    D --> D3[修改环境]

    E --> E1[短期记忆<br/>对话上下文]
    E --> E2[工作记忆<br/>当前任务状态]
    E --> E3[长期记忆<br/>知识库 / 经验]

    style A fill:#2c3e50,color:#fff
    style C fill:#e74c3c,color:#fff
    style D fill:#3498db,color:#fff
    style E fill:#2ecc71,color:#fff

ReAct 模式

ReAct (Reasoning + Acting) 是最基础也最实用的 Agent 模式,让模型交替进行推理(Thought)和行动(Action):

sequenceDiagram
    participant U as User
    participant A as Agent (LLM)
    participant T as Tools

    U->>A: "北京和上海哪个城市今天更热?"

    Note over A: Thought: 需要查询两个城市的温度
    A->>T: Action: get_weather("北京")
    T-->>A: Observation: 北京 32°C

    Note over A: Thought: 已获得北京温度,还需上海
    A->>T: Action: get_weather("上海")
    T-->>A: Observation: 上海 35°C

    Note over A: Thought: 上海35°C > 北京32°C
    A->>U: "上海今天更热(35°C > 32°C)"

ReAct 实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from langchain.agents import create_react_agent, AgentExecutor
from langchain_core.prompts import PromptTemplate

REACT_PROMPT = PromptTemplate.from_template("""Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought: {agent_scratchpad}""")

agent = create_react_agent(llm, tools, REACT_PROMPT)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=10)

result = executor.invoke({"input": "北京和上海哪个城市今天更热?"})

规划策略

Plan-and-Execute

先制定完整计划,再逐步执行——适合复杂的多步任务:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from langchain.agents import create_plan_and_execute_agent
from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner

# Planner: generates step-by-step plan
planner = load_chat_planner(llm)

# Executor: executes each step
executor_agent = load_agent_executor(llm, tools, verbose=True)

# Combined
agent = PlanAndExecute(planner=planner, executor=executor_agent, verbose=True)

result = agent.invoke({
"input": "帮我调研目前主流的向量数据库,对比它们的性能,然后写一份推荐报告"
})
graph TD
    A[复杂任务] --> B[Planner<br/>LLM 生成计划]
    B --> C["Step 1: 列出主流向量数据库"]
    C --> D["Step 2: 搜索各数据库性能数据"]
    D --> E["Step 3: 制作对比表格"]
    E --> F["Step 4: 撰写推荐报告"]

    C --> G[Executor<br/>执行并观察]
    D --> G
    E --> G
    F --> G

    G --> H{需要重新规划?}
    H -->|是| B
    H -->|否| I[最终结果]

    style B fill:#e74c3c,color:#fff
    style G fill:#3498db,color:#fff
    style I fill:#2ecc71,color:#fff

Adaptive Planning(LangGraph 实现)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Sequence
import operator

class AgentState(TypedDict):
messages: Annotated[Sequence[dict], operator.add]
plan: list[str]
current_step: int
results: dict
should_replan: bool

def planner_node(state: AgentState) -> AgentState:
"""Generate or revise the plan."""
messages = state["messages"]
plan_prompt = f"""Based on the task and current progress, create a plan.

Task: {messages[0]['content']}
Previous results: {state.get('results', {})}

Output a numbered list of steps."""

plan = llm.invoke(plan_prompt).content
steps = [s.strip() for s in plan.split("\n") if s.strip()]
return {"plan": steps, "current_step": 0}

def executor_node(state: AgentState) -> AgentState:
"""Execute the current step."""
step = state["plan"][state["current_step"]]
result = agent_executor.invoke({"input": step})
return {
"results": {**state.get("results", {}), step: result["output"]},
"current_step": state["current_step"] + 1,
}

def should_continue(state: AgentState) -> str:
"""Decide whether to continue, replan, or finish."""
if state["current_step"] >= len(state["plan"]):
return "finish"
if state.get("should_replan"):
return "replan"
return "execute"

# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("planner", planner_node)
workflow.add_node("executor", executor_node)
workflow.add_node("evaluator", evaluator_node)

workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_edge("executor", "evaluator")
workflow.add_conditional_edges("evaluator", should_continue, {
"execute": "executor",
"replan": "planner",
"finish": END,
})

app = workflow.compile()
result = app.invoke({"messages": [{"role": "user", "content": "研究RAG最新进展"}]})

工具使用(Tool Use)

工具设计原则

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from langchain_core.tools import tool
from pydantic import BaseModel, Field

# 1. Clear schema with descriptions
class SearchInput(BaseModel):
query: str = Field(description="搜索关键词")
max_results: int = Field(default=5, description="最大返回结果数")
date_range: str = Field(default="all", description="时间范围: today/week/month/year/all")

@tool(args_schema=SearchInput)
def web_search(query: str, max_results: int = 5, date_range: str = "all") -> str:
"""搜索互联网获取最新信息。适用于需要实时数据或最新资讯的场景。"""
# Implementation
pass

# 2. Error handling in tools
@tool
def execute_sql(query: str) -> str:
"""在只读数据库上执行SQL查询。仅支持SELECT语句。"""
if not query.strip().upper().startswith("SELECT"):
return "Error: Only SELECT statements are allowed for security."

try:
results = db.execute(query)
return json.dumps(results, ensure_ascii=False, indent=2)
except Exception as e:
return f"SQL Error: {str(e)}. Please check your query syntax."

# 3. Compound tools (tool that uses other tools)
@tool
def analyze_repository(repo_url: str) -> str:
"""分析GitHub仓库的代码质量、技术栈和活跃度。"""
# Uses multiple sub-tools internally
readme = fetch_github_file(repo_url, "README.md")
languages = fetch_github_languages(repo_url)
commits = fetch_recent_commits(repo_url, days=30)

analysis = llm.invoke(f"""分析以下仓库信息:
README: {readme}
语言: {languages}
近30天提交数: {len(commits)}
""")

return analysis.content

记忆管理

graph TB
    A[Agent Memory Architecture] --> B[Sensory Memory<br/>当前输入]
    A --> C[Short-term Memory<br/>对话窗口]
    A --> D[Long-term Memory<br/>持久存储]

    C --> C1[最近 N 轮对话]
    C --> C2[当前任务上下文]

    D --> D1[Episodic<br/>历史事件记忆]
    D --> D2[Semantic<br/>知识库 / 向量存储]
    D --> D3[Procedural<br/>技能 / 工具使用经验]

    style B fill:#ffd700,color:#000
    style C fill:#3498db,color:#fff
    style D fill:#2ecc71,color:#fff

实现多层记忆

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from datetime import datetime
import json

class AgentMemory:
def __init__(self):
# Short-term: conversation buffer
self.conversation_history = []
self.max_history = 20

# Long-term: vector store for episodic memory
self.episodic_memory = Chroma(
collection_name="episodic",
embedding_function=OpenAIEmbeddings(),
persist_directory="./memory/episodic",
)

# Semantic memory: knowledge base
self.semantic_memory = Chroma(
collection_name="semantic",
embedding_function=OpenAIEmbeddings(),
persist_directory="./memory/semantic",
)

def add_conversation(self, role: str, content: str):
"""Add to short-term memory."""
self.conversation_history.append({
"role": role,
"content": content,
"timestamp": datetime.now().isoformat(),
})
if len(self.conversation_history) > self.max_history:
# Summarize and archive old conversations
self._archive_old_conversations()

def _archive_old_conversations(self):
"""Move old conversations to long-term memory."""
old = self.conversation_history[:10]
summary = self._summarize(old)
self.episodic_memory.add_texts(
texts=[summary],
metadatas=[{"type": "conversation_summary", "date": datetime.now().isoformat()}],
)
self.conversation_history = self.conversation_history[10:]

def recall(self, query: str, k: int = 3) -> str:
"""Retrieve relevant memories."""
episodic = self.episodic_memory.similarity_search(query, k=k)
semantic = self.semantic_memory.similarity_search(query, k=k)

memories = []
if episodic:
memories.append("相关历史记忆:\n" + "\n".join([d.page_content for d in episodic]))
if semantic:
memories.append("相关知识:\n" + "\n".join([d.page_content for d in semantic]))

return "\n\n".join(memories)

def learn(self, knowledge: str, metadata: dict = None):
"""Store new knowledge in semantic memory."""
self.semantic_memory.add_texts(
texts=[knowledge],
metadatas=[metadata or {}],
)

多 Agent 系统

Supervisor 模式

graph TB
    A[User Query] --> B[Supervisor Agent]
    B --> C{任务分配}
    C --> D[Research Agent<br/>信息搜集]
    C --> E[Code Agent<br/>代码编写]
    C --> F[Review Agent<br/>质量审查]

    D --> G[汇报结果]
    E --> G
    F --> G
    G --> B
    B --> H[综合回答]

    style B fill:#e74c3c,color:#fff
    style D fill:#3498db,color:#fff
    style E fill:#f39c12,color:#000
    style F fill:#2ecc71,color:#fff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, SystemMessage

class MultiAgentState(TypedDict):
messages: list
next_agent: str
results: dict

def supervisor_node(state: MultiAgentState) -> MultiAgentState:
"""Supervisor decides which agent to invoke next."""
supervisor_prompt = """你是一个项目经理。根据用户需求和当前进度,决定下一步应该交给哪个团队成员。

可用的团队成员:
- researcher: 负责信息搜集和调研
- coder: 负责编写代码
- reviewer: 负责审查代码和文档质量
- FINISH: 所有工作已完成

当前状态:
{state_summary}

请只回答下一步应该交给谁(researcher/coder/reviewer/FINISH)。"""

response = llm.invoke(supervisor_prompt.format(
state_summary=json.dumps(state.get("results", {}), ensure_ascii=False)
))

return {"next_agent": response.content.strip()}

def researcher_node(state: MultiAgentState) -> MultiAgentState:
"""Research agent gathers information."""
task = state["messages"][-1]["content"]
result = research_executor.invoke({"input": f"调研以下主题: {task}"})
return {
"results": {**state.get("results", {}), "research": result["output"]},
"messages": state["messages"] + [{"role": "assistant", "content": f"调研完成: {result['output']}"}],
}

def coder_node(state: MultiAgentState) -> MultiAgentState:
"""Coder agent writes code."""
research = state.get("results", {}).get("research", "")
result = code_executor.invoke({
"input": f"基于以下调研结果编写代码:\n{research}"
})
return {
"results": {**state.get("results", {}), "code": result["output"]},
}

# Build multi-agent graph
workflow = StateGraph(MultiAgentState)
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
workflow.add_node("reviewer", reviewer_node)

workflow.set_entry_point("supervisor")
workflow.add_conditional_edges("supervisor", lambda s: s["next_agent"], {
"researcher": "researcher",
"coder": "coder",
"reviewer": "reviewer",
"FINISH": END,
})
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("coder", "supervisor")
workflow.add_edge("reviewer", "supervisor")

multi_agent = workflow.compile()

Agent 评估

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Agent evaluation dimensions
evaluation_framework = {
"task_completion": {
"description": "是否完成了用户请求的任务",
"metric": "success_rate",
"target": 0.9,
},
"efficiency": {
"description": "完成任务所需的步骤数",
"metric": "avg_steps",
"target": "< 5 steps",
},
"tool_selection": {
"description": "是否选择了正确的工具",
"metric": "tool_accuracy",
"target": 0.95,
},
"error_recovery": {
"description": "遇到错误时能否恢复",
"metric": "recovery_rate",
"target": 0.8,
},
"hallucination": {
"description": "是否产生了不基于工具返回值的内容",
"metric": "faithfulness_score",
"target": 0.95,
},
}

Agent 框架对比

框架 特点 适用场景
LangGraph 基于状态图,细粒度控制 复杂流程、需要精确控制
CrewAI 角色扮演、团队协作 多 Agent 协作任务
AutoGen 对话驱动的多 Agent 研究和实验
Haystack 管道化设计 RAG 为主的应用
Semantic Kernel 微软生态,企业级 企业 AI 集成

总结

AI Agent 的核心挑战在于:如何让 LLM 可靠地规划和执行多步骤任务。ReAct 是最基础且实用的模式;Plan-and-Execute 适合复杂任务;多 Agent 系统适合需要专业分工的场景。

关键设计原则:

  1. 明确工具边界:工具描述要精确,输入输出要清晰
  2. 控制迭代上限:防止 Agent 陷入无限循环
  3. 错误处理:工具调用失败时提供有意义的错误信息
  4. 记忆管理:平衡上下文长度和信息保留
  5. 人在环中:关键决策点加入人工审批
  6. 可观测性:记录每一步的 Thought/Action/Observation
作者 · authorzt
发布 · date2025-10-15
篇幅 · length3.2k 字 · 8 min
许可 · licenseCC BY-SA 4.0
$ echo "comments" · 评论