langchain-learning-kit/docs/architecture.md

# LangChain Learning Kit - Architecture

## Overview

The LangChain Learning Kit is an enterprise-grade learning platform built with FastAPI, LangChain, MySQL, and FAISS. It provides:

- **Model Management**: Configure and manage multiple LLMs and embedding models
- **Knowledge Bases**: Vector-based document storage and retrieval using FAISS
- **Multi-turn Conversations**: Context-aware chat with RAG support
- **Agent Orchestration**: LangChain agents with custom tools

## Architecture Layers

### 1. API Layer (`app/api/`)

FastAPI routers handling HTTP requests:

- **models.py**: Model CRUD operations
- **kb.py**: Knowledge base management and querying
- **conv.py**: Conversation and message endpoints
- **agent.py**: Agent execution and logging

### 2. Service Layer (`app/services/`)

Business logic and orchestration:

- **ModelManager**: LLM and embedding model management with environment variable substitution
- **KnowledgeBaseManager**: Document ingestion, chunking, and vector search
- **ConversationManager**: Multi-turn conversations with RAG integration
- **AgentOrchestrator**: Agent execution with tool calling
- **AsyncJobManager**: Background job processing using asyncio

### 3. Tools Layer (`app/tools/`)

LangChain tools for agent use:

- **KnowledgeBaseRetriever**: Vector similarity search tool
- **CalculatorTool**: Mathematical expression evaluation

### 4. Data Layer (`app/db/`)

Database models and management:

- **SQLAlchemy ORM Models**: Models, KnowledgeBase, Document, Conversation, Message, ToolCall
- **Alembic Migrations**: Schema versioning and migration scripts
- **DatabaseManager**: Connection pooling and session management

### 5. Utilities (`app/utils/`)

Supporting utilities:

- **TextSplitter**: Document chunking with configurable size/overlap
- **FAISSHelper**: FAISS index creation, persistence, and querying
- **Logger**: Structured JSON logging with structlog
- **Exceptions**: Custom exception hierarchy

## Data Flow

### RAG-Enhanced Chat

```
User Request
    ↓
[API Layer] conv.py → chat()
    ↓
[Service] ConversationManager.chat()
    ↓
1. Save user message to DB
2. Retrieve conversation history
3. [If RAG enabled] Query KB → FAISS → Top-K docs
4. Build prompt: context + history + user input
5. LLM.predict()
6. Save assistant message
    ↓
Response with sources
```

### Document Ingestion

```
Document Upload
    ↓
[API Layer] kb.py → ingest_documents()
    ↓
[Service] KnowledgeBaseManager.ingest_documents()
    ↓
[Background Job] _ingest_documents_sync()
    ↓
1. Get embedding model
2. Chunk documents (TextSplitter)
3. Create/load FAISS index
4. Add document vectors
5. Save index to disk
6. Save metadata to DB
    ↓
Job Complete
```

### Agent Execution

```
Agent Task
    ↓
[API Layer] agent.py → execute_agent()
    ↓
[Service] AgentOrchestrator.execute_agent()
    ↓
1. Create tools (Calculator, Retriever)
2. Create LangChain agent
3. Execute with AgentExecutor
4. Log tool calls to DB
    ↓
Response with tool call history
```

## Database Schema

### Models Table
- Stores LLM and embedding model configurations
- Config stored as JSON with ${VAR} substitution support

### Knowledge Bases
- KnowledgeBase: Metadata (name, description)
- Document: Content, source, metadata, embedding_id

### Conversations
- Conversation: User ID, title
- Message: Role, content, metadata (sources, model)

### Tool Calls
- Logs all agent tool invocations
- Stores input/output for debugging

## Configuration

Environment-based configuration using Pydantic Settings:

```
DATABASE_URL         # MySQL connection string
OPENAI_API_KEY       # OpenAI API key
OPENAI_BASE_URL      # Optional proxy/custom endpoint
FAISS_BASE_PATH      # FAISS index storage directory
CHUNK_SIZE           # Document chunk size (default 1000)
CHUNK_OVERLAP        # Chunk overlap (default 200)
API_HOST             # Server host (default 0.0.0.0)
API_PORT             # Server port (default 8000)
ENVIRONMENT          # dev/staging/production
DEBUG                # Enable debug mode
```

## Key Design Decisions

### 1. Asyncio over Celery
- Uses Python's native asyncio for background jobs
- Simpler deployment, no external message broker required
- Suitable for medium-scale workloads

### 2. FAISS CPU Version
- Uses faiss-cpu for broader compatibility
- Easier deployment without GPU requirements
- Sufficient for learning/development purposes

### 3. JSON Metadata Storage
- Uses MySQL JSON columns for flexible metadata
- Renamed from `metadata` to `doc_metadata`/`msg_metadata` to avoid SQLAlchemy reserved words

### 4. OpenAI Proxy Support
- Configurable `base_url` for API routing
- Supports both LLM and embedding endpoints
- Applied globally or per-model

### 5. Structured Logging
- JSON-formatted logs via structlog
- Easy parsing for log aggregation systems
- Contextual information for debugging

## Deployment

### Development
```bash
# Local development
conda activate pyth-311
python scripts/init_db.py
python -m uvicorn app.main:app --reload
```

### Production with Docker
```bash
# Build and run with Docker Compose
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f api
```

## Scalability Considerations

### Current Architecture
- Single FastAPI instance
- MySQL for persistence
- FAISS indexes on local disk
- Asyncio for background jobs

### Future Enhancements
- **Horizontal Scaling**: Add load balancer, shared FAISS storage (S3/NFS)
- **Distributed Jobs**: Replace asyncio with Celery + Redis/RabbitMQ
- **Vector Database**: Migrate from FAISS to Pinecone/Weaviate/Milvus
- **Caching**: Add Redis for conversation history and KB query results
- **Monitoring**: Prometheus metrics, Grafana dashboards
- **Authentication**: JWT-based auth with user management

## Security

Current implementation is designed for learning/development:

- ⚠️ No authentication/authorization
- ⚠️ CORS configured for `*` (all origins)
- ⚠️ Database credentials in .env file

**Production recommendations:**
- Implement JWT authentication
- Add role-based access control (RBAC)
- Use secrets management (Vault, AWS Secrets Manager)
- Restrict CORS to specific origins
- Enable HTTPS/TLS
- Add rate limiting
- Implement input validation and sanitization
Initial commit: LangChain learning kit with Vue3 frontend 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-10-02 17:22:19 +08:00			`# LangChain Learning Kit - Architecture`

			`## Overview`

			`The LangChain Learning Kit is an enterprise-grade learning platform built with FastAPI, LangChain, MySQL, and FAISS. It provides:`

			`- Model Management: Configure and manage multiple LLMs and embedding models`
			`- Knowledge Bases: Vector-based document storage and retrieval using FAISS`
			`- Multi-turn Conversations: Context-aware chat with RAG support`
			`- Agent Orchestration: LangChain agents with custom tools`

			`## Architecture Layers`

			### 1. API Layer (`app/api/`)

			`FastAPI routers handling HTTP requests:`

			`- models.py: Model CRUD operations`
			`- kb.py: Knowledge base management and querying`
			`- conv.py: Conversation and message endpoints`
			`- agent.py: Agent execution and logging`

			### 2. Service Layer (`app/services/`)

			`Business logic and orchestration:`

			`- ModelManager: LLM and embedding model management with environment variable substitution`
			`- KnowledgeBaseManager: Document ingestion, chunking, and vector search`
			`- ConversationManager: Multi-turn conversations with RAG integration`
			`- AgentOrchestrator: Agent execution with tool calling`
			`- AsyncJobManager: Background job processing using asyncio`

			### 3. Tools Layer (`app/tools/`)

			`LangChain tools for agent use:`

			`- KnowledgeBaseRetriever: Vector similarity search tool`
			`- CalculatorTool: Mathematical expression evaluation`

			### 4. Data Layer (`app/db/`)

			`Database models and management:`

			`- SQLAlchemy ORM Models: Models, KnowledgeBase, Document, Conversation, Message, ToolCall`
			`- Alembic Migrations: Schema versioning and migration scripts`
			`- DatabaseManager: Connection pooling and session management`

			### 5. Utilities (`app/utils/`)

			`Supporting utilities:`

			`- TextSplitter: Document chunking with configurable size/overlap`
			`- FAISSHelper: FAISS index creation, persistence, and querying`
			`- Logger: Structured JSON logging with structlog`
			`- Exceptions: Custom exception hierarchy`

			`## Data Flow`

			`### RAG-Enhanced Chat`

			```
			`User Request`
			`↓`
			`[API Layer] conv.py → chat()`
			`↓`
			`[Service] ConversationManager.chat()`
			`↓`
			`1. Save user message to DB`
			`2. Retrieve conversation history`
			`3. [If RAG enabled] Query KB → FAISS → Top-K docs`
			`4. Build prompt: context + history + user input`
			`5. LLM.predict()`
			`6. Save assistant message`
			`↓`
			`Response with sources`
			```

			`### Document Ingestion`

			```
			`Document Upload`
			`↓`
			`[API Layer] kb.py → ingest_documents()`
			`↓`
			`[Service] KnowledgeBaseManager.ingest_documents()`
			`↓`
			`[Background Job] _ingest_documents_sync()`
			`↓`
			`1. Get embedding model`
			`2. Chunk documents (TextSplitter)`
			`3. Create/load FAISS index`
			`4. Add document vectors`
			`5. Save index to disk`
			`6. Save metadata to DB`
			`↓`
			`Job Complete`
			```

			`### Agent Execution`

			```
			`Agent Task`
			`↓`
			`[API Layer] agent.py → execute_agent()`
			`↓`
			`[Service] AgentOrchestrator.execute_agent()`
			`↓`
			`1. Create tools (Calculator, Retriever)`
			`2. Create LangChain agent`
			`3. Execute with AgentExecutor`
			`4. Log tool calls to DB`
			`↓`
			`Response with tool call history`
			```

			`## Database Schema`

			`### Models Table`
			`- Stores LLM and embedding model configurations`
			`- Config stored as JSON with ${VAR} substitution support`

			`### Knowledge Bases`
			`- KnowledgeBase: Metadata (name, description)`
			`- Document: Content, source, metadata, embedding_id`

			`### Conversations`
			`- Conversation: User ID, title`
			`- Message: Role, content, metadata (sources, model)`

			`### Tool Calls`
			`- Logs all agent tool invocations`
			`- Stores input/output for debugging`

			`## Configuration`

			`Environment-based configuration using Pydantic Settings:`

			```
			`DATABASE_URL # MySQL connection string`
			`OPENAI_API_KEY # OpenAI API key`
			`OPENAI_BASE_URL # Optional proxy/custom endpoint`
			`FAISS_BASE_PATH # FAISS index storage directory`
			`CHUNK_SIZE # Document chunk size (default 1000)`
			`CHUNK_OVERLAP # Chunk overlap (default 200)`
			`API_HOST # Server host (default 0.0.0.0)`
			`API_PORT # Server port (default 8000)`
			`ENVIRONMENT # dev/staging/production`
			`DEBUG # Enable debug mode`
			```

			`## Key Design Decisions`

			`### 1. Asyncio over Celery`
			`- Uses Python's native asyncio for background jobs`
			`- Simpler deployment, no external message broker required`
			`- Suitable for medium-scale workloads`

			`### 2. FAISS CPU Version`
			`- Uses faiss-cpu for broader compatibility`
			`- Easier deployment without GPU requirements`
			`- Sufficient for learning/development purposes`

			`### 3. JSON Metadata Storage`
			`- Uses MySQL JSON columns for flexible metadata`
			- Renamed from `metadata` to `doc_metadata`/`msg_metadata` to avoid SQLAlchemy reserved words

			`### 4. OpenAI Proxy Support`
			- Configurable `base_url` for API routing
			`- Supports both LLM and embedding endpoints`
			`- Applied globally or per-model`

			`### 5. Structured Logging`
			`- JSON-formatted logs via structlog`
			`- Easy parsing for log aggregation systems`
			`- Contextual information for debugging`

			`## Deployment`

			`### Development`
			```bash
			`# Local development`
			`conda activate pyth-311`
			`python scripts/init_db.py`
			`python -m uvicorn app.main:app --reload`
			```

			`### Production with Docker`
			```bash
			`# Build and run with Docker Compose`
			`docker-compose up -d`

			`# Check status`
			`docker-compose ps`

			`# View logs`
			`docker-compose logs -f api`
			```

			`## Scalability Considerations`

			`### Current Architecture`
			`- Single FastAPI instance`
			`- MySQL for persistence`
			`- FAISS indexes on local disk`
			`- Asyncio for background jobs`

			`### Future Enhancements`
			`- Horizontal Scaling: Add load balancer, shared FAISS storage (S3/NFS)`
			`- Distributed Jobs: Replace asyncio with Celery + Redis/RabbitMQ`
			`- Vector Database: Migrate from FAISS to Pinecone/Weaviate/Milvus`
			`- Caching: Add Redis for conversation history and KB query results`
			`- Monitoring: Prometheus metrics, Grafana dashboards`
			`- Authentication: JWT-based auth with user management`

			`## Security`

			`Current implementation is designed for learning/development:`

			`- ⚠️ No authentication/authorization`
			- ⚠️ CORS configured for `*` (all origins)
			`- ⚠️ Database credentials in .env file`

			`Production recommendations:`
			`- Implement JWT authentication`
			`- Add role-based access control (RBAC)`
			`- Use secrets management (Vault, AWS Secrets Manager)`
			`- Restrict CORS to specific origins`
			`- Enable HTTPS/TLS`
			`- Add rate limiting`
			`- Implement input validation and sanitization`