AI Agent 开发实战完全指南:从零构建智能自动化系统
AI Agent 是大模型应用的核心形态,它让 AI 从"问答工具"进化为"自主行动者",能够规划任务、调用工具、自主决策,真正实现智能自动化。本文将从零开始,系统讲解 AI Agent 的设计原理与开发实战。
一、AI Agent 核心概念
1.1 什么是 AI Agent?
AI Agent(智能代理)是一种能够感知环境、自主决策、执行行动的 AI 系统。与传统的"一问一答"式对话不同,Agent 具备以下核心能力:
- 自主规划:将复杂任务拆解为可执行的步骤序列
- 工具调用:动态选择和使用外部工具(API、数据库、搜索引擎等)
- 记忆管理:维护上下文信息,支持长期记忆与短期记忆
- 反思优化:根据执行结果进行自我修正和改进
1.2 Agent 架构模式
当前主流的 Agent 架构包括:
- ReAct 模式:推理(Reasoning)与行动(Acting)交替进行
- Plan-and-Execute 模式:先规划完整方案,再逐步执行
- 多 Agent 协作模式:多个 Agent 分工合作完成复杂任务
二、Agent 核心组件设计
2.1 规划模块(Planner)
规划模块负责将用户目标拆解为具体的执行步骤:
from typing import List, Dictimport jsonclass AgentPlanner: """Agent 规划模块:负责任务分解与步骤生成""" def __init__(self, llm_client): self.llm = llm_client def decompose_task(self, user_goal: str) -> List[Dict]: """将用户目标分解为执行步骤""" planning_prompt = f""" 你是一个任务规划专家。请将以下目标分解为具体的执行步骤。 目标:{user_goal} 要求: 1. 每个步骤必须是原子操作 2. 步骤之间要有清晰的依赖关系 3. 返回 JSON 格式的步骤列表 """ response = self.llm.generate(planning_prompt) try: plan = json.loads(response) return plan.get("steps", []) except json.JSONDecodeError: return [{"step_id": 1, "action": user_goal, "tool": "llm"}]2.2 工具调用模块(Tool Executor)
工具模块是 Agent 的"手",让它能够与外部世界交互:
from abc import ABC, abstractmethodfrom typing import Any, Dictimport requestsclass BaseTool(ABC): """工具基类:定义工具的标准接口""" @property @abstractmethod def name(self) -> str: pass @property @abstractmethod def description(self) -> str: pass @abstractmethod def execute(self, **kwargs) -> Any: passclass SearchTool(BaseTool): """搜索工具:联网搜索信息""" @property def name(self) -> str: return "web_search" @property def description(self) -> str: return "搜索互联网获取实时信息。参数:query(搜索关键词)" def execute(self, query: str, **kwargs) -> Dict: response = requests.get( "https://api.search.example.com/search", params={"q": query}, timeout=10 ) return { "success": response.status_code == 200, "results": response.json().get("items", []) }class CalculatorTool(BaseTool): """计算器工具:执行数学运算""" @property def name(self) -> str: return "calculator" @property def description(self) -> str: return "执行数学计算。参数:expression(数学表达式)" def execute(self, expression: str, **kwargs) -> Dict: try: allowed_chars = set("0123456789+-*/().% ") if all(c in allowed_chars for c in expression): result = eval(expression) return {"success": True, "result": result} else: return {"success": False, "error": "非法字符"} except Exception as e: return {"success": False, "error": str(e)}class ToolRegistry: """工具注册中心:管理所有可用工具""" def __init__(self): self._tools: Dict[str, BaseTool] = {} def register(self, tool: BaseTool): self._tools[tool.name] = tool def get_tool(self, name: str) -> BaseTool: return self._tools.get(name) def list_tools(self) -> str: descriptions = [] for name, tool in self._tools.items(): descriptions.append(f"- {name}: {tool.description}") return "\\n".join(descriptions)2.3 记忆模块(Memory)
记忆模块让 Agent 能够记住历史交互和重要信息:
from typing import List, Dictfrom dataclasses import dataclassfrom datetime import datetime@dataclassclass MemoryItem: """记忆条目""" content: str timestamp: datetime importance: float memory_type: strclass AgentMemory: """Agent 记忆系统""" def __init__(self, max_short_term: int = 10, max_long_term: int = 100): self.short_term: List[MemoryItem] = [] self.long_term: List[MemoryItem] = [] self.max_short_term = max_short_term self.max_long_term = max_long_term def add_memory(self, content: str, importance: float = 0.5, memory_type: str = "short_term"): """添加新记忆""" item = MemoryItem( content=content, timestamp=datetime.now(), importance=importance, memory_type=memory_type ) if memory_type == "short_term": self.short_term.append(item) if len(self.short_term) > self.max_short_term: oldest = self.short_term.pop(0) if oldest.importance > 0.7: self._add_to_long_term(oldest) else: self._add_to_long_term(item) def _add_to_long_term(self, item: MemoryItem): self.long_term.append(item) if len(self.long_term) > self.max_long_term: self.long_term.sort(key=lambda x: x.importance, reverse=True) self.long_term = self.long_term[:self.max_long_term] def get_relevant_memories(self, query: str, top_k: int = 5) -> List[str]: relevant = [] query_lower = query.lower() for item in self.short_term + self.long_term: if query_lower in item.content.lower(): relevant.append(item.content) return relevant[:top_k] def get_context_summary(self) -> str: if not self.short_term: return "暂无历史记忆" summary_parts = ["【最近交互】"] for item in self.short_term[-5:]: summary_parts.append(f"- {item.content}") return "\\n".join(summary_parts)三、完整 Agent 实现
3.1 ReAct Agent 核心实现
from typing import Dict, List, Optionalimport jsonclass ReActAgent: """ ReAct Agent:推理-行动循环的智能代理 工作流程: 1. Thought(思考):分析当前状态和目标 2. Action(行动):选择并执行工具 3. Observation(观察):获取执行结果 4. 循环直到任务完成 """ def __init__(self, llm_client, tool_registry: ToolRegistry): self.llm = llm_client self.tools = tool_registry self.memory = AgentMemory() self.max_iterations = 10 def build_prompt(self, task: str, history: List[Dict]) -> str: tool_descriptions = self.tools.list_tools() memory_context = self.memory.get_context_summary() prompt = f"""你是一个智能代理,使用 ReAct 模式解决问题。可用工具:{tool_descriptions}历史记忆:{memory_context}当前任务:{task}请按以下格式思考和行动:- Thought: 你的思考过程- Action: 工具名称- Action Input: 工具参数(JSON 格式)或当任务完成时输出:- Final Answer: 最终答案""" return prompt def parse_response(self, response: str) -> Dict: result = {"thought": "", "action": None, "action_input": {}, "final_answer": None} lines = response.strip().split("\\n") for line in lines: if line.startswith("Thought:"): result["thought"] = line[8:].strip() elif line.startswith("Action:"): result["action"] = line[7:].strip() elif line.startswith("Action Input:"): try: result["action_input"] = json.loads(line[13:].strip()) except: result["action_input"] = {"input": line[13:].strip()} elif line.startswith("Final Answer:"): result["final_answer"] = line[13:].strip() return result def run(self, task: str) -> str: history = [] for iteration in range(self.max_iterations): prompt = self.build_prompt(task, history) response = self.llm.generate(prompt) parsed = self.parse_response(response) history.append({ "thought": parsed["thought"], "action": parsed["action"], "result": None }) if parsed["final_answer"]: self.memory.add_memory( f"任务「{task}」完成,答案:{parsed['final_answer']}", importance=0.8 ) return parsed["final_answer"] if parsed["action"]: tool = self.tools.get_tool(parsed["action"]) if tool: observation = tool.execute(**parsed["action_input"]) history[-1]["result"] = observation self.memory.add_memory( f"执行 {parsed['action']}: {str(observation)[:200]}", importance=0.5 ) else: history[-1]["result"] = {"error": f"未知工具: {parsed['action']}"} return "达到最大迭代次数,任务未完成"# 使用示例if __name__ == "__main__": llm = YourLLMClient() registry = ToolRegistry() registry.register(SearchTool()) registry.register(CalculatorTool()) agent = ReActAgent(llm, registry) result = agent.run("帮我搜索 Python 的最新版本并计算 3.12 + 3.13 的和") print(f"结果: {result}")四、多 Agent 协作模式
4.1 主从协作架构
复杂任务可以拆分给多个专业 Agent 协作完成:
from typing import Dict, Listfrom enum import Enumclass AgentRole(Enum): """Agent 角色类型""" COORDINATOR = "coordinator" RESEARCHER = "researcher" CODER = "coder" REVIEWER = "reviewer"class MultiAgentSystem: """多 Agent 协作系统""" def __init__(self): self.agents: Dict[AgentRole, ReActAgent] = {} self.communication_bus: List[Dict] = [] def register_agent(self, role: AgentRole, agent: ReActAgent): self.agents[role] = agent def broadcast(self, from_role: AgentRole, message: str, to_roles: List[AgentRole]): self.communication_bus.append({ "from": from_role.value, "to": [r.value for r in to_roles], "message": message, "timestamp": datetime.now() }) def run_collaborative_task(self, task: str) -> str: coordinator = self.agents.get(AgentRole.COORDINATOR) if not coordinator: return "缺少协调者 Agent" decomposition = coordinator.run(f"请将以下任务分解为子任务:{task}") researcher = self.agents.get(AgentRole.RESEARCHER) if researcher: research_result = researcher.run(f"研究背景:{decomposition}") self.broadcast(AgentRole.RESEARCHER, research_result, [AgentRole.CODER]) coder = self.agents.get(AgentRole.CODER) if coder: code_result = coder.run(f"根据研究结果执行:{decomposition}") reviewer = self.agents.get(AgentRole.REVIEWER) if reviewer: review = reviewer.run(f"审核执行结果:{code_result}") return review return code_result or "任务执行完成"五、Agent 开发最佳实践
5.1 安全性设计
- 工具权限控制:限制敏感操作的调用
- 输入验证:对所有工具输入进行白名单校验
- 沙箱执行:代码执行在隔离环境中进行
- 人类确认:关键操作需要人工确认
5.2 性能优化
- 工具缓存:缓存常用工具的执行结果
- 并行执行:独立任务使用多线程并行处理
- 早停机制:检测到无效循环时提前终止
- 流式输出:实时展示思考过程
5.3 调试与监控
import loggingfrom dataclasses import dataclassfrom datetime import datetime@dataclassclass AgentTrace: """Agent 执行追踪记录""" step: int thought: str action: str observation: str timestamp: datetimeclass AgentLogger: """Agent 执行日志记录器""" def __init__(self, log_file: str = "agent_trace.log"): self.traces: List[AgentTrace] = [] logging.basicConfig(filename=log_file, level=logging.INFO) def log_step(self, step: int, thought: str, action: str, observation: str): trace = AgentTrace( step=step, thought=thought, action=action, observation=observation, timestamp=datetime.now() ) self.traces.append(trace) log_msg = f"[Step {step}] Thought: {thought} | Action: {action}" logging.info(log_msg) def export_traces(self) -> List[Dict]: return [ { "step": t.step, "thought": t.thought, "action": t.action, "observation": t.observation, "timestamp": t.timestamp.isoformat() } for t in self.traces ]总结
AI Agent 是大模型应用的核心形态,它让 AI 从被动的问答工具进化为主动的智能代理。本文系统讲解了:
- 核心概念:理解 Agent 的自主规划、工具调用、记忆管理等能力
- 架构设计:掌握 ReAct、Plan-and-Execute、多 Agent 协作等模式
- 组件实现:完整的规划器、工具系统、记忆模块代码实现
- 最佳实践:安全性、性能优化、调试监控的工程经验
掌握 AI Agent 开发,你将能够构建真正智能的自动化系统,让 AI 成为可以自主完成复杂任务的"数字员工"。在实际项目中,建议从简单场景开始,逐步增加 Agent 的能力和复杂度。
本文链接:https://www.kkkliao.cn/?id=932 转载需授权!
版权声明:本文由廖万里的博客发布,如需转载请注明出处。



手机流量卡
免费领卡
号卡合伙人
产品服务
关于本站
