AI Agent架构设计与实现模式

引言

AI Agent 是当前 LLM 应用最激动人心的方向——让模型从被动回答问题，进化为主动规划、使用工具、持续学习的自主体。从 ChatGPT Plugins 到 AutoGPT，从 LangGraph 到 CrewAI，Agent 生态正在快速演变。本文将系统探讨 Agent 的核心架构——ReAct 模式、规划策略、工具使用、记忆管理以及多 Agent 协作系统的设计与实现。

Agent 核心架构

graph TB
    A[AI Agent] --> B[感知 Perception]
    A --> C[推理 Reasoning]
    A --> D[行动 Action]
    A --> E[记忆 Memory]

    B --> B1[用户输入]
    B --> B2[环境观察]
    B --> B3[工具返回值]

    C --> C1[任务分解]
    C --> C2[计划生成]
    C --> C3[决策选择]

    D --> D1[调用工具]
    D --> D2[生成回复]
    D --> D3[修改环境]

    E --> E1[短期记忆<br/>对话上下文]
    E --> E2[工作记忆<br/>当前任务状态]
    E --> E3[长期记忆<br/>知识库 / 经验]

    style A fill:#2c3e50,color:#fff
    style C fill:#e74c3c,color:#fff
    style D fill:#3498db,color:#fff
    style E fill:#2ecc71,color:#fff

ReAct 模式

ReAct (Reasoning + Acting) 是最基础也最实用的 Agent 模式，让模型交替进行推理（Thought）和行动（Action）：

sequenceDiagram
    participant U as User
    participant A as Agent (LLM)
    participant T as Tools

    U->>A: "北京和上海哪个城市今天更热?"

    Note over A: Thought: 需要查询两个城市的温度
    A->>T: Action: get_weather("北京")
    T-->>A: Observation: 北京 32°C

    Note over A: Thought: 已获得北京温度，还需上海
    A->>T: Action: get_weather("上海")
    T-->>A: Observation: 上海 35°C

    Note over A: Thought: 上海35°C > 北京32°C
    A->>U: "上海今天更热(35°C > 32°C)"

ReAct 实现

from langchain.agents import create_react_agent, AgentExecutor
from langchain_core.prompts import PromptTemplate

REACT_PROMPT = PromptTemplate.from_template("""Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought: {agent_scratchpad}""")

agent = create_react_agent(llm, tools, REACT_PROMPT)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=10)

result = executor.invoke({"input": "北京和上海哪个城市今天更热?"})

规划策略

Plan-and-Execute

先制定完整计划，再逐步执行——适合复杂的多步任务：

from langchain.agents import create_plan_and_execute_agent
from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner

# Planner: generates step-by-step plan
planner = load_chat_planner(llm)

# Executor: executes each step
executor_agent = load_agent_executor(llm, tools, verbose=True)

# Combined
agent = PlanAndExecute(planner=planner, executor=executor_agent, verbose=True)

result = agent.invoke({
    "input": "帮我调研目前主流的向量数据库，对比它们的性能，然后写一份推荐报告"
})

graph TD
    A[复杂任务] --> B[Planner<br/>LLM 生成计划]
    B --> C["Step 1: 列出主流向量数据库"]
    C --> D["Step 2: 搜索各数据库性能数据"]
    D --> E["Step 3: 制作对比表格"]
    E --> F["Step 4: 撰写推荐报告"]

    C --> G[Executor<br/>执行并观察]
    D --> G
    E --> G
    F --> G

    G --> H{需要重新规划?}
    H -->|是| B
    H -->|否| I[最终结果]

    style B fill:#e74c3c,color:#fff
    style G fill:#3498db,color:#fff
    style I fill:#2ecc71,color:#fff

Adaptive Planning（LangGraph 实现）

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Sequence
import operator

class AgentState(TypedDict):
    messages: Annotated[Sequence[dict], operator.add]
    plan: list[str]
    current_step: int
    results: dict
    should_replan: bool

def planner_node(state: AgentState) -> AgentState:
    """Generate or revise the plan."""
    messages = state["messages"]
    plan_prompt = f"""Based on the task and current progress, create a plan.

Task: {messages[0]['content']}
Previous results: {state.get('results', {})}

Output a numbered list of steps."""

    plan = llm.invoke(plan_prompt).content
    steps = [s.strip() for s in plan.split("\n") if s.strip()]
    return {"plan": steps, "current_step": 0}

def executor_node(state: AgentState) -> AgentState:
    """Execute the current step."""
    step = state["plan"][state["current_step"]]
    result = agent_executor.invoke({"input": step})
    return {
        "results": {**state.get("results", {}), step: result["output"]},
        "current_step": state["current_step"] + 1,
    }

def should_continue(state: AgentState) -> str:
    """Decide whether to continue, replan, or finish."""
    if state["current_step"] >= len(state["plan"]):
        return "finish"
    if state.get("should_replan"):
        return "replan"
    return "execute"

# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("planner", planner_node)
workflow.add_node("executor", executor_node)
workflow.add_node("evaluator", evaluator_node)

workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_edge("executor", "evaluator")
workflow.add_conditional_edges("evaluator", should_continue, {
    "execute": "executor",
    "replan": "planner",
    "finish": END,
})

app = workflow.compile()
result = app.invoke({"messages": [{"role": "user", "content": "研究RAG最新进展"}]})

工具使用（Tool Use）

工具设计原则

from langchain_core.tools import tool
from pydantic import BaseModel, Field

# 1. Clear schema with descriptions
class SearchInput(BaseModel):
    query: str = Field(description="搜索关键词")
    max_results: int = Field(default=5, description="最大返回结果数")
    date_range: str = Field(default="all", description="时间范围: today/week/month/year/all")

@tool(args_schema=SearchInput)
def web_search(query: str, max_results: int = 5, date_range: str = "all") -> str:
    """搜索互联网获取最新信息。适用于需要实时数据或最新资讯的场景。"""
    # Implementation
    pass

# 2. Error handling in tools
@tool
def execute_sql(query: str) -> str:
    """在只读数据库上执行SQL查询。仅支持SELECT语句。"""
    if not query.strip().upper().startswith("SELECT"):
        return "Error: Only SELECT statements are allowed for security."

    try:
        results = db.execute(query)
        return json.dumps(results, ensure_ascii=False, indent=2)
    except Exception as e:
        return f"SQL Error: {str(e)}. Please check your query syntax."

# 3. Compound tools (tool that uses other tools)
@tool
def analyze_repository(repo_url: str) -> str:
    """分析GitHub仓库的代码质量、技术栈和活跃度。"""
    # Uses multiple sub-tools internally
    readme = fetch_github_file(repo_url, "README.md")
    languages = fetch_github_languages(repo_url)
    commits = fetch_recent_commits(repo_url, days=30)

    analysis = llm.invoke(f"""分析以下仓库信息：
    README: {readme}
    语言: {languages}
    近30天提交数: {len(commits)}
    """)

    return analysis.content

记忆管理

graph TB
    A[Agent Memory Architecture] --> B[Sensory Memory<br/>当前输入]
    A --> C[Short-term Memory<br/>对话窗口]
    A --> D[Long-term Memory<br/>持久存储]

    C --> C1[最近 N 轮对话]
    C --> C2[当前任务上下文]

    D --> D1[Episodic<br/>历史事件记忆]
    D --> D2[Semantic<br/>知识库 / 向量存储]
    D --> D3[Procedural<br/>技能 / 工具使用经验]

    style B fill:#ffd700,color:#000
    style C fill:#3498db,color:#fff
    style D fill:#2ecc71,color:#fff

实现多层记忆

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from datetime import datetime
import json

class AgentMemory:
    def __init__(self):
        # Short-term: conversation buffer
        self.conversation_history = []
        self.max_history = 20

        # Long-term: vector store for episodic memory
        self.episodic_memory = Chroma(
            collection_name="episodic",
            embedding_function=OpenAIEmbeddings(),
            persist_directory="./memory/episodic",
        )

        # Semantic memory: knowledge base
        self.semantic_memory = Chroma(
            collection_name="semantic",
            embedding_function=OpenAIEmbeddings(),
            persist_directory="./memory/semantic",
        )

    def add_conversation(self, role: str, content: str):
        """Add to short-term memory."""
        self.conversation_history.append({
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat(),
        })
        if len(self.conversation_history) > self.max_history:
            # Summarize and archive old conversations
            self._archive_old_conversations()

    def _archive_old_conversations(self):
        """Move old conversations to long-term memory."""
        old = self.conversation_history[:10]
        summary = self._summarize(old)
        self.episodic_memory.add_texts(
            texts=[summary],
            metadatas=[{"type": "conversation_summary", "date": datetime.now().isoformat()}],
        )
        self.conversation_history = self.conversation_history[10:]

    def recall(self, query: str, k: int = 3) -> str:
        """Retrieve relevant memories."""
        episodic = self.episodic_memory.similarity_search(query, k=k)
        semantic = self.semantic_memory.similarity_search(query, k=k)

        memories = []
        if episodic:
            memories.append("相关历史记忆:\n" + "\n".join([d.page_content for d in episodic]))
        if semantic:
            memories.append("相关知识:\n" + "\n".join([d.page_content for d in semantic]))

        return "\n\n".join(memories)

    def learn(self, knowledge: str, metadata: dict = None):
        """Store new knowledge in semantic memory."""
        self.semantic_memory.add_texts(
            texts=[knowledge],
            metadatas=[metadata or {}],
        )

多 Agent 系统

Supervisor 模式

graph TB
    A[User Query] --> B[Supervisor Agent]
    B --> C{任务分配}
    C --> D[Research Agent<br/>信息搜集]
    C --> E[Code Agent<br/>代码编写]
    C --> F[Review Agent<br/>质量审查]

    D --> G[汇报结果]
    E --> G
    F --> G
    G --> B
    B --> H[综合回答]

    style B fill:#e74c3c,color:#fff
    style D fill:#3498db,color:#fff
    style E fill:#f39c12,color:#000
    style F fill:#2ecc71,color:#fff

from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage, SystemMessage

class MultiAgentState(TypedDict):
    messages: list
    next_agent: str
    results: dict

def supervisor_node(state: MultiAgentState) -> MultiAgentState:
    """Supervisor decides which agent to invoke next."""
    supervisor_prompt = """你是一个项目经理。根据用户需求和当前进度，决定下一步应该交给哪个团队成员。

可用的团队成员：
- researcher: 负责信息搜集和调研
- coder: 负责编写代码
- reviewer: 负责审查代码和文档质量
- FINISH: 所有工作已完成

当前状态：
{state_summary}

请只回答下一步应该交给谁（researcher/coder/reviewer/FINISH）。"""

    response = llm.invoke(supervisor_prompt.format(
        state_summary=json.dumps(state.get("results", {}), ensure_ascii=False)
    ))

    return {"next_agent": response.content.strip()}

def researcher_node(state: MultiAgentState) -> MultiAgentState:
    """Research agent gathers information."""
    task = state["messages"][-1]["content"]
    result = research_executor.invoke({"input": f"调研以下主题: {task}"})
    return {
        "results": {**state.get("results", {}), "research": result["output"]},
        "messages": state["messages"] + [{"role": "assistant", "content": f"调研完成: {result['output']}"}],
    }

def coder_node(state: MultiAgentState) -> MultiAgentState:
    """Coder agent writes code."""
    research = state.get("results", {}).get("research", "")
    result = code_executor.invoke({
        "input": f"基于以下调研结果编写代码:\n{research}"
    })
    return {
        "results": {**state.get("results", {}), "code": result["output"]},
    }

# Build multi-agent graph
workflow = StateGraph(MultiAgentState)
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
workflow.add_node("reviewer", reviewer_node)

workflow.set_entry_point("supervisor")
workflow.add_conditional_edges("supervisor", lambda s: s["next_agent"], {
    "researcher": "researcher",
    "coder": "coder",
    "reviewer": "reviewer",
    "FINISH": END,
})
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("coder", "supervisor")
workflow.add_edge("reviewer", "supervisor")

multi_agent = workflow.compile()

Agent 评估

# Agent evaluation dimensions
evaluation_framework = {
    "task_completion": {
        "description": "是否完成了用户请求的任务",
        "metric": "success_rate",
        "target": 0.9,
    },
    "efficiency": {
        "description": "完成任务所需的步骤数",
        "metric": "avg_steps",
        "target": "< 5 steps",
    },
    "tool_selection": {
        "description": "是否选择了正确的工具",
        "metric": "tool_accuracy",
        "target": 0.95,
    },
    "error_recovery": {
        "description": "遇到错误时能否恢复",
        "metric": "recovery_rate",
        "target": 0.8,
    },
    "hallucination": {
        "description": "是否产生了不基于工具返回值的内容",
        "metric": "faithfulness_score",
        "target": 0.95,
    },
}

Agent 框架对比

框架	特点	适用场景
LangGraph	基于状态图，细粒度控制	复杂流程、需要精确控制
CrewAI	角色扮演、团队协作	多 Agent 协作任务
AutoGen	对话驱动的多 Agent	研究和实验
Haystack	管道化设计	RAG 为主的应用
Semantic Kernel	微软生态，企业级	企业 AI 集成

总结

AI Agent 的核心挑战在于：如何让 LLM 可靠地规划和执行多步骤任务。ReAct 是最基础且实用的模式；Plan-and-Execute 适合复杂任务；多 Agent 系统适合需要专业分工的场景。

关键设计原则：

明确工具边界：工具描述要精确，输入输出要清晰
控制迭代上限：防止 Agent 陷入无限循环
错误处理：工具调用失败时提供有意义的错误信息
记忆管理：平衡上下文长度和信息保留
人在环中：关键决策点加入人工审批
可观测性：记录每一步的 Thought/Action/Observation