lang-demo1/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## 项目概述

这是一个基于 LangChain 的学习演示项目,展示如何使用大语言模型进行各种应用场景的开发,包括基础调用、提示词模板、输出解析、向量存储、RAG 检索增强和 Agent。

## 环境配置

### 依赖管理
- Python 项目,依赖列表在 `requirements.txt`
- 安装依赖: `pip install -r requirements.txt`
- 关键依赖包括: langchain, langchain-openai, langchain-community, python-dotenv, faiss-cpu

### 环境变量配置
项目根目录的 `.env` 文件包含 API 配置:
- `OPENAI_BASE_URL`: OpenAI API 基础 URL
- `OPENAI_API_KEY1`: OpenAI API 密钥
- `LANGSMITH_TRACING`: LangSmith 追踪开关(必须设置为 "true")
- `LANGSMITH_API_KEY`: LangSmith API 密钥
- `LANGSMITH_PROJECT`: LangSmith 项目名称

注意: 代码中使用 `OPENAI_API_KEY1` 而不是标准的 `OPENAI_API_KEY`

## 代码结构

### 目录组织
```
lang-demo1/
├── notebooks/          # 学习笔记和示例
│   ├── demo1.ipynb    # 主要演示: LLM调用、提示词、解析器、向量、RAG、Agent
│   └── sample.ipynb   # 环境配置和基础调用示例
├── src/               # 可复用的源代码
│   ├── utils/        # 工具函数
│   └── chains/       # 自定义链
├── tests/            # 单元测试
├── data/             # 数据文件
├── models/           # 模型文件
├── .env              # 环境变量 (不提交到 Git)
├── .gitignore        # Git 忽略配置
├── requirements.txt  # Python 依赖
├── README.md         # 项目说明
└── CLAUDE.md         # 本文件
```

### 核心功能模块

#### 1. 基础 LLM 调用
使用 `ChatOpenAI` 调用大模型:
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4.1-nano")
response = llm.invoke("你的问题")
```

#### 2. 提示词模板系统
使用 `ChatPromptTemplate` 创建结构化提示:
```python
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "系统角色描述"),
    ("user", "{input}")
])
chain = prompt | llm
```

#### 3. 输出解析
使用 `JsonOutputParser` 将 LLM 输出解析为 JSON:
- 使用 `output_parser.get_format_instructions()` 获取格式要求
- 将格式要求注入到提示词模板中

#### 4. 向量存储与检索
使用 FAISS 向量数据库:
- 通过 `WebBaseLoader` 抓取网页内容
- 使用 `RecursiveCharacterTextSplitter` 分割文档(chunk_size=500, chunk_overlap=50)
- 使用 `OpenAIEmbeddings` (model="text-embedding-3-small") 生成向量
- 使用 `FAISS.from_documents()` 创建向量存储

#### 5. RAG (检索增强生成)
核心流程:
1. 使用 `vector.as_retriever()` 创建检索器(k=3)
2. 检索相关文档片段
3. 将检索结果作为上下文注入提示词
4. LLM 基于上下文生成回答

#### 6. Agent 系统
使用 LangChain Agent 实现工具调用:
- 使用 `create_retriever_tool()` 创建检索工具
- 使用 `create_openai_functions_agent()` 创建 Agent
- 使用 `AgentExecutor` 执行 Agent(verbose=True 可查看执行过程)

## 开发工作流

### 运行 Jupyter Notebooks
主要开发工作在 Jupyter notebook 中进行:
- 启动 Jupyter: `jupyter notebook` 或 `jupyter lab`
- 主要演示代码位于 `notebooks/demo1.ipynb`
- 基础示例位于 `notebooks/sample.ipynb`

### 环境初始化模式
代码使用严格的环境变量检查:
```python
import dotenv
dotenv.load_dotenv()
# 必须在 .env 中配置所有必需变量,否则会抛出异常
```

### 模型配置
当前项目使用的模型:
- LLM: `gpt-4.1-nano` (通过 OpenAI 兼容接口)
- Embeddings: `text-embedding-3-small`

## 技术要点

### 链式调用 (Chain)
使用管道操作符 `|` 组合组件:
```python
chain = prompt | llm | output_parser
result = chain.invoke({"input": "问题"})
```

### 向量检索配置
- 默认检索 top-k=3 个最相关文档
- 文档分割使用重叠策略避免语义断裂
- 向量数据当前存储在内存中,未做持久化

### Web 数据加载
使用 `WebBaseLoader` 抓取特定网页内容:
- 支持使用 `bs4.SoupStrainer` 精确定位内容区域
- 示例中抓取中国政府网站的法律条文内容
Initial commit: LangChain demo project with RAG and Agent examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-10-01 17:01:35 +08:00			`# CLAUDE.md`

			`This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.`

			`## 项目概述`

			`这是一个基于 LangChain 的学习演示项目,展示如何使用大语言模型进行各种应用场景的开发,包括基础调用、提示词模板、输出解析、向量存储、RAG 检索增强和 Agent。`

			`## 环境配置`

			`### 依赖管理`
			- Python 项目,依赖列表在 `requirements.txt`
			- 安装依赖: `pip install -r requirements.txt`
			`- 关键依赖包括: langchain, langchain-openai, langchain-community, python-dotenv, faiss-cpu`

			`### 环境变量配置`
			项目根目录的 `.env` 文件包含 API 配置:
			- `OPENAI_BASE_URL`: OpenAI API 基础 URL
			- `OPENAI_API_KEY1`: OpenAI API 密钥
			- `LANGSMITH_TRACING`: LangSmith 追踪开关(必须设置为 "true")
			- `LANGSMITH_API_KEY`: LangSmith API 密钥
			- `LANGSMITH_PROJECT`: LangSmith 项目名称

			注意: 代码中使用 `OPENAI_API_KEY1` 而不是标准的 `OPENAI_API_KEY`

			`## 代码结构`

			`### 目录组织`
Refactor project structure for better organization - Move notebooks to dedicated notebooks/ directory - Create src/ directory for reusable code (utils/, chains/) - Add tests/ directory for unit tests - Add comprehensive .gitignore file - Update CLAUDE.md and README.md with new structure - Remove old app/ directory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-10-01 17:06:51 +08:00			```
			`lang-demo1/`
			`├── notebooks/ # 学习笔记和示例`
			`│ ├── demo1.ipynb # 主要演示: LLM调用、提示词、解析器、向量、RAG、Agent`
			`│ └── sample.ipynb # 环境配置和基础调用示例`
			`├── src/ # 可复用的源代码`
			`│ ├── utils/ # 工具函数`
			`│ └── chains/ # 自定义链`
			`├── tests/ # 单元测试`
			`├── data/ # 数据文件`
			`├── models/ # 模型文件`
			`├── .env # 环境变量 (不提交到 Git)`
			`├── .gitignore # Git 忽略配置`
			`├── requirements.txt # Python 依赖`
			`├── README.md # 项目说明`
			`└── CLAUDE.md # 本文件`
			```
Initial commit: LangChain demo project with RAG and Agent examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-10-01 17:01:35 +08:00
			`### 核心功能模块`

			`#### 1. 基础 LLM 调用`
			使用 `ChatOpenAI` 调用大模型:
			```python
			`from langchain_openai import ChatOpenAI`
			`llm = ChatOpenAI(model="gpt-4.1-nano")`
			`response = llm.invoke("你的问题")`
			```

			`#### 2. 提示词模板系统`
			使用 `ChatPromptTemplate` 创建结构化提示:
			```python
			`from langchain_core.prompts import ChatPromptTemplate`
			`prompt = ChatPromptTemplate.from_messages([`
			`("system", "系统角色描述"),`
			`("user", "{input}")`
			`])`
			`chain = prompt \| llm`
			```

			`#### 3. 输出解析`
			使用 `JsonOutputParser` 将 LLM 输出解析为 JSON:
			- 使用 `output_parser.get_format_instructions()` 获取格式要求
			`- 将格式要求注入到提示词模板中`

			`#### 4. 向量存储与检索`
			`使用 FAISS 向量数据库:`
			- 通过 `WebBaseLoader` 抓取网页内容
			- 使用 `RecursiveCharacterTextSplitter` 分割文档(chunk_size=500, chunk_overlap=50)
			- 使用 `OpenAIEmbeddings` (model="text-embedding-3-small") 生成向量
			- 使用 `FAISS.from_documents()` 创建向量存储

			`#### 5. RAG (检索增强生成)`
			`核心流程:`
			1. 使用 `vector.as_retriever()` 创建检索器(k=3)
			`2. 检索相关文档片段`
			`3. 将检索结果作为上下文注入提示词`
			`4. LLM 基于上下文生成回答`

			`#### 6. Agent 系统`
			`使用 LangChain Agent 实现工具调用:`
			- 使用 `create_retriever_tool()` 创建检索工具
			- 使用 `create_openai_functions_agent()` 创建 Agent
			- 使用 `AgentExecutor` 执行 Agent(verbose=True 可查看执行过程)

			`## 开发工作流`

			`### 运行 Jupyter Notebooks`
			`主要开发工作在 Jupyter notebook 中进行:`
			- 启动 Jupyter: `jupyter notebook` 或 `jupyter lab`
Refactor project structure for better organization - Move notebooks to dedicated notebooks/ directory - Create src/ directory for reusable code (utils/, chains/) - Add tests/ directory for unit tests - Add comprehensive .gitignore file - Update CLAUDE.md and README.md with new structure - Remove old app/ directory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-10-01 17:06:51 +08:00			- 主要演示代码位于 `notebooks/demo1.ipynb`
			- 基础示例位于 `notebooks/sample.ipynb`
Initial commit: LangChain demo project with RAG and Agent examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-10-01 17:01:35 +08:00
			`### 环境初始化模式`
			`代码使用严格的环境变量检查:`
			```python
			`import dotenv`
			`dotenv.load_dotenv()`
			`# 必须在 .env 中配置所有必需变量,否则会抛出异常`
			```

			`### 模型配置`
			`当前项目使用的模型:`
			- LLM: `gpt-4.1-nano` (通过 OpenAI 兼容接口)
			- Embeddings: `text-embedding-3-small`

			`## 技术要点`

			`### 链式调用 (Chain)`
			使用管道操作符 `\|` 组合组件:
			```python
			`chain = prompt \| llm \| output_parser`
			`result = chain.invoke({"input": "问题"})`
			```

			`### 向量检索配置`
			`- 默认检索 top-k=3 个最相关文档`
			`- 文档分割使用重叠策略避免语义断裂`
			`- 向量数据当前存储在内存中,未做持久化`

			`### Web 数据加载`
			使用 `WebBaseLoader` 抓取特定网页内容:
			- 支持使用 `bs4.SoupStrainer` 精确定位内容区域
			`- 示例中抓取中国政府网站的法律条文内容`