Initial commit: LangChain demo project with RAG and Agent examples
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
commit
06e205111e
|
|
@ -0,0 +1,11 @@
|
|||
{
|
||||
"permissions": {
|
||||
"allow": [
|
||||
"Bash(git init:*)",
|
||||
"Bash(git remote add:*)",
|
||||
"Bash(git add:*)"
|
||||
],
|
||||
"deny": [],
|
||||
"ask": []
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,16 @@
|
|||
# deepseek
|
||||
#OPENAI_BASE_URL=https://api.deepseek.com
|
||||
#OPENAI_API_KEY1=sk-751cd94ff1ba44c38ae9f5f27f688ac0
|
||||
|
||||
#closeai
|
||||
OPENAI_BASE_URL=https://api.openai-proxy.org/v1
|
||||
OPENAI_API_KEY1=sk-QcFv70swj4t8PlUpC7lKHIcfH8URo8USzPjfZXTQsd1Bv8C6
|
||||
|
||||
|
||||
# qianwen3
|
||||
#TONGYI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
|
||||
#TONGYI_API_KEY=sk-894a16620d32430e802f3f201627949f
|
||||
|
||||
# 模型
|
||||
#LANGSMITH_MODEL=deepseek-chat
|
||||
#LANGSMITH_PROJECT=你的项目名(默认可填default)
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
# 默认忽略的文件
|
||||
/shelf/
|
||||
/workspace.xml
|
||||
# 基于编辑器的 HTTP 客户端请求
|
||||
/httpRequests/
|
||||
|
|
@ -0,0 +1,17 @@
|
|||
<component name="InspectionProjectProfileManager">
|
||||
<profile version="1.0">
|
||||
<option name="myName" value="Project Default" />
|
||||
<inspection_tool class="PyPackageRequirementsInspection" enabled="true" level="WARNING" enabled_by_default="true">
|
||||
<option name="ignoredPackages">
|
||||
<list>
|
||||
<option value="pandas" />
|
||||
<option value="openpyxl" />
|
||||
<option value="xlrd" />
|
||||
<option value="pyyaml" />
|
||||
<option value="jsonschema" />
|
||||
<option value="python-dotenv" />
|
||||
</list>
|
||||
</option>
|
||||
</inspection_tool>
|
||||
</profile>
|
||||
</component>
|
||||
|
|
@ -0,0 +1,6 @@
|
|||
<component name="InspectionProjectProfileManager">
|
||||
<settings>
|
||||
<option name="USE_PROJECT_PROFILE" value="false" />
|
||||
<version value="1.0" />
|
||||
</settings>
|
||||
</component>
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<module type="PYTHON_MODULE" version="4">
|
||||
<component name="NewModuleRootManager">
|
||||
<content url="file://$MODULE_DIR$">
|
||||
<excludeFolder url="file://$MODULE_DIR$/models" />
|
||||
</content>
|
||||
<orderEntry type="jdk" jdkName="pyth3-10" jdkType="Python SDK" />
|
||||
<orderEntry type="sourceFolder" forTests="false" />
|
||||
</component>
|
||||
</module>
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project version="4">
|
||||
<component name="Black">
|
||||
<option name="sdkName" value="pyth3-10" />
|
||||
</component>
|
||||
<component name="ProjectRootManager" version="2" project-jdk-name="pyth3-10" project-jdk-type="Python SDK" />
|
||||
</project>
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<project version="4">
|
||||
<component name="ProjectModuleManager">
|
||||
<modules>
|
||||
<module fileurl="file://$PROJECT_DIR$/.idea/lang-demo1.iml" filepath="$PROJECT_DIR$/.idea/lang-demo1.iml" />
|
||||
</modules>
|
||||
</component>
|
||||
</project>
|
||||
|
|
@ -0,0 +1,118 @@
|
|||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## 项目概述
|
||||
|
||||
这是一个基于 LangChain 的学习演示项目,展示如何使用大语言模型进行各种应用场景的开发,包括基础调用、提示词模板、输出解析、向量存储、RAG 检索增强和 Agent。
|
||||
|
||||
## 环境配置
|
||||
|
||||
### 依赖管理
|
||||
- Python 项目,依赖列表在 `requirements.txt`
|
||||
- 安装依赖: `pip install -r requirements.txt`
|
||||
- 关键依赖包括: langchain, langchain-openai, langchain-community, python-dotenv, faiss-cpu
|
||||
|
||||
### 环境变量配置
|
||||
项目根目录的 `.env` 文件包含 API 配置:
|
||||
- `OPENAI_BASE_URL`: OpenAI API 基础 URL
|
||||
- `OPENAI_API_KEY1`: OpenAI API 密钥
|
||||
- `LANGSMITH_TRACING`: LangSmith 追踪开关(必须设置为 "true")
|
||||
- `LANGSMITH_API_KEY`: LangSmith API 密钥
|
||||
- `LANGSMITH_PROJECT`: LangSmith 项目名称
|
||||
|
||||
注意: 代码中使用 `OPENAI_API_KEY1` 而不是标准的 `OPENAI_API_KEY`
|
||||
|
||||
## 代码结构
|
||||
|
||||
### 目录组织
|
||||
- `/app/demo/`: 包含演示代码的 Jupyter notebook
|
||||
- `demo1.ipynb`: 主要演示文件,包含 7 个核心示例
|
||||
- `/models/`: 模型相关文件(目前为空)
|
||||
- `/data/`: 数据文件(目前为空)
|
||||
- `sample.ipynb`: 根目录的示例 notebook,展示环境配置和基础模型调用
|
||||
|
||||
### 核心功能模块
|
||||
|
||||
#### 1. 基础 LLM 调用
|
||||
使用 `ChatOpenAI` 调用大模型:
|
||||
```python
|
||||
from langchain_openai import ChatOpenAI
|
||||
llm = ChatOpenAI(model="gpt-4.1-nano")
|
||||
response = llm.invoke("你的问题")
|
||||
```
|
||||
|
||||
#### 2. 提示词模板系统
|
||||
使用 `ChatPromptTemplate` 创建结构化提示:
|
||||
```python
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
prompt = ChatPromptTemplate.from_messages([
|
||||
("system", "系统角色描述"),
|
||||
("user", "{input}")
|
||||
])
|
||||
chain = prompt | llm
|
||||
```
|
||||
|
||||
#### 3. 输出解析
|
||||
使用 `JsonOutputParser` 将 LLM 输出解析为 JSON:
|
||||
- 使用 `output_parser.get_format_instructions()` 获取格式要求
|
||||
- 将格式要求注入到提示词模板中
|
||||
|
||||
#### 4. 向量存储与检索
|
||||
使用 FAISS 向量数据库:
|
||||
- 通过 `WebBaseLoader` 抓取网页内容
|
||||
- 使用 `RecursiveCharacterTextSplitter` 分割文档(chunk_size=500, chunk_overlap=50)
|
||||
- 使用 `OpenAIEmbeddings` (model="text-embedding-3-small") 生成向量
|
||||
- 使用 `FAISS.from_documents()` 创建向量存储
|
||||
|
||||
#### 5. RAG (检索增强生成)
|
||||
核心流程:
|
||||
1. 使用 `vector.as_retriever()` 创建检索器(k=3)
|
||||
2. 检索相关文档片段
|
||||
3. 将检索结果作为上下文注入提示词
|
||||
4. LLM 基于上下文生成回答
|
||||
|
||||
#### 6. Agent 系统
|
||||
使用 LangChain Agent 实现工具调用:
|
||||
- 使用 `create_retriever_tool()` 创建检索工具
|
||||
- 使用 `create_openai_functions_agent()` 创建 Agent
|
||||
- 使用 `AgentExecutor` 执行 Agent(verbose=True 可查看执行过程)
|
||||
|
||||
## 开发工作流
|
||||
|
||||
### 运行 Jupyter Notebooks
|
||||
主要开发工作在 Jupyter notebook 中进行:
|
||||
- 启动 Jupyter: `jupyter notebook` 或 `jupyter lab`
|
||||
- 主要演示代码位于 `app/demo/demo1.ipynb`
|
||||
|
||||
### 环境初始化模式
|
||||
代码使用严格的环境变量检查:
|
||||
```python
|
||||
import dotenv
|
||||
dotenv.load_dotenv()
|
||||
# 必须在 .env 中配置所有必需变量,否则会抛出异常
|
||||
```
|
||||
|
||||
### 模型配置
|
||||
当前项目使用的模型:
|
||||
- LLM: `gpt-4.1-nano` (通过 OpenAI 兼容接口)
|
||||
- Embeddings: `text-embedding-3-small`
|
||||
|
||||
## 技术要点
|
||||
|
||||
### 链式调用 (Chain)
|
||||
使用管道操作符 `|` 组合组件:
|
||||
```python
|
||||
chain = prompt | llm | output_parser
|
||||
result = chain.invoke({"input": "问题"})
|
||||
```
|
||||
|
||||
### 向量检索配置
|
||||
- 默认检索 top-k=3 个最相关文档
|
||||
- 文档分割使用重叠策略避免语义断裂
|
||||
- 向量数据当前存储在内存中,未做持久化
|
||||
|
||||
### Web 数据加载
|
||||
使用 `WebBaseLoader` 抓取特定网页内容:
|
||||
- 支持使用 `bs4.SoupStrainer` 精确定位内容区域
|
||||
- 示例中抓取中国政府网站的法律条文内容
|
||||
File diff suppressed because one or more lines are too long
|
|
@ -0,0 +1,108 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "raw",
|
||||
"source": "1.开始",
|
||||
"id": "e69a816f955ad467"
|
||||
},
|
||||
{
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2025-09-30T01:42:53.631350Z",
|
||||
"start_time": "2025-09-30T01:42:53.622345Z"
|
||||
}
|
||||
},
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from dotenv import load_dotenv # 强制依赖 python-dotenv\n",
|
||||
"\n",
|
||||
"# 加载 .env 文件中的配置(如果文件不存在或加载失败会报错)\n",
|
||||
"load_dotenv(override=True) # override=True 确保覆盖系统中已有的同名环境变量\n",
|
||||
"\n",
|
||||
"# 从 .env 文件中读取配置,若不存在则直接抛出异常\n",
|
||||
"required_vars = [\n",
|
||||
" \"LANGSMITH_TRACING\",\n",
|
||||
" \"LANGSMITH_API_KEY\",\n",
|
||||
" \"LANGSMITH_PROJECT\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# 检查必要的配置项是否存在\n",
|
||||
"missing_vars = [var for var in required_vars if var not in os.environ]\n",
|
||||
"if missing_vars:\n",
|
||||
" raise ValueError(\n",
|
||||
" f\".env 文件中缺少必要的配置项:{', '.join(missing_vars)}\\n\"\n",
|
||||
" \"请确保 .env 文件中包含以下配置:\\n\"\n",
|
||||
" \"LANGSMITH_TRACING=true\\n\"\n",
|
||||
" \"LANGSMITH_API_KEY=你的API密钥\\n\"\n",
|
||||
" \"LANGSMITH_PROJECT=你的项目名(默认可填default)\"\n",
|
||||
" )\n",
|
||||
"print(\"配置检查通过\")\n",
|
||||
"# 输出\n",
|
||||
"print(f\"LANGSMITH_TRACING: {os.environ['LANGSMITH_TRACING']}\")\n",
|
||||
"\n",
|
||||
"# 验证 LANGSMITH_TRACING 必须为 true(保持原逻辑的强制开启)\n",
|
||||
"if os.environ[\"LANGSMITH_TRACING\"].lower() != \"true\":\n",
|
||||
" raise ValueError(\"LANGSMITH_TRACING 必须设置为 true(区分大小写,建议直接填写true)\")"
|
||||
],
|
||||
"id": "f55903969ee7b754",
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"配置检查通过\n",
|
||||
"LANGSMITH_TRACING: true\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 6
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "markdown",
|
||||
"source": "2.三四十",
|
||||
"id": "42e3303c7c697d34"
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
|
||||
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter API key for OpenAI: \")\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import init_chat_model\n",
|
||||
"\n",
|
||||
"model = init_chat_model(\"gpt-4o-mini\", model_provider=\"openai\")"
|
||||
],
|
||||
"id": "d9c653b7f628c02f"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Loading…
Reference in New Issue