langchain-learning-kit/docs/architecture.md

# LangChain Learning Kit - Architecture

## Overview

The LangChain Learning Kit is an enterprise-grade learning platform built with FastAPI, LangChain, MySQL, and FAISS. It provides:

- **Model Management**: Configure and manage multiple LLMs and embedding models
- **Knowledge Bases**: Vector-based document storage and retrieval using FAISS
- **Multi-turn Conversations**: Context-aware chat with RAG support
- **Agent Orchestration**: LangChain agents with custom tools

## Architecture Layers

### 1. API Layer (`app/api/`)

FastAPI routers handling HTTP requests:

- **models.py**: Model CRUD operations
- **kb.py**: Knowledge base management and querying
- **conv.py**: Conversation and message endpoints
- **agent.py**: Agent execution and logging

### 2. Service Layer (`app/services/`)

Business logic and orchestration:

- **ModelManager**: LLM and embedding model management with environment variable substitution
- **KnowledgeBaseManager**: Document ingestion, chunking, and vector search
- **ConversationManager**: Multi-turn conversations with RAG integration
- **AgentOrchestrator**: Agent execution with tool calling
- **AsyncJobManager**: Background job processing using asyncio

### 3. Tools Layer (`app/tools/`)

LangChain tools for agent use:

- **KnowledgeBaseRetriever**: Vector similarity search tool
- **CalculatorTool**: Mathematical expression evaluation

### 4. Data Layer (`app/db/`)

Database models and management:

- **SQLAlchemy ORM Models**: Models, KnowledgeBase, Document, Conversation, Message, ToolCall
- **Alembic Migrations**: Schema versioning and migration scripts
- **DatabaseManager**: Connection pooling and session management

### 5. Utilities (`app/utils/`)

Supporting utilities:

- **TextSplitter**: Document chunking with configurable size/overlap
- **FAISSHelper**: FAISS index creation, persistence, and querying
- **Logger**: Structured JSON logging with structlog
- **Exceptions**: Custom exception hierarchy

## Data Flow

### RAG-Enhanced Chat

```
User Request
    ↓
[API Layer] conv.py → chat()
    ↓
[Service] ConversationManager.chat()
    ↓
1. Save user message to DB
2. Retrieve conversation history
3. [If RAG enabled] Query KB → FAISS → Top-K docs
4. Build prompt: context + history + user input
5. LLM.predict()
6. Save assistant message
    ↓
Response with sources
```

### Document Ingestion

```
Document Upload
    ↓
[API Layer] kb.py → ingest_documents()
    ↓
[Service] KnowledgeBaseManager.ingest_documents()
    ↓
[Background Job] _ingest_documents_sync()
    ↓
1. Get embedding model
2. Chunk documents (TextSplitter)
3. Create/load FAISS index
4. Add document vectors
5. Save index to disk
6. Save metadata to DB
    ↓
Job Complete
```

### Agent Execution

```
Agent Task
    ↓
[API Layer] agent.py → execute_agent()
    ↓
[Service] AgentOrchestrator.execute_agent()
    ↓
1. Create tools (Calculator, Retriever)
2. Create LangChain agent
3. Execute with AgentExecutor
4. Log tool calls to DB
    ↓
Response with tool call history
```

## Database Schema

### Models Table
- Stores LLM and embedding model configurations
- Config stored as JSON with ${VAR} substitution support

### Knowledge Bases
- KnowledgeBase: Metadata (name, description)
- Document: Content, source, metadata, embedding_id

### Conversations
- Conversation: User ID, title
- Message: Role, content, metadata (sources, model)

### Tool Calls
- Logs all agent tool invocations
- Stores input/output for debugging

## Configuration

Environment-based configuration using Pydantic Settings:

```
DATABASE_URL         # MySQL connection string
OPENAI_API_KEY       # OpenAI API key
OPENAI_BASE_URL      # Optional proxy/custom endpoint
FAISS_BASE_PATH      # FAISS index storage directory
CHUNK_SIZE           # Document chunk size (default 1000)
CHUNK_OVERLAP        # Chunk overlap (default 200)
API_HOST             # Server host (default 0.0.0.0)
API_PORT             # Server port (default 8000)
ENVIRONMENT          # dev/staging/production
DEBUG                # Enable debug mode
```

## Key Design Decisions

### 1. Asyncio over Celery
- Uses Python's native asyncio for background jobs
- Simpler deployment, no external message broker required
- Suitable for medium-scale workloads

### 2. FAISS CPU Version
- Uses faiss-cpu for broader compatibility
- Easier deployment without GPU requirements
- Sufficient for learning/development purposes

### 3. JSON Metadata Storage
- Uses MySQL JSON columns for flexible metadata
- Renamed from `metadata` to `doc_metadata`/`msg_metadata` to avoid SQLAlchemy reserved words

### 4. OpenAI Proxy Support
- Configurable `base_url` for API routing
- Supports both LLM and embedding endpoints
- Applied globally or per-model

### 5. Structured Logging
- JSON-formatted logs via structlog
- Easy parsing for log aggregation systems
- Contextual information for debugging

## Deployment

### Development
```bash
# Local development
conda activate pyth-311
python scripts/init_db.py
python -m uvicorn app.main:app --reload
```

### Production with Docker
```bash
# Build and run with Docker Compose
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f api
```

## Scalability Considerations

### Current Architecture
- Single FastAPI instance
- MySQL for persistence
- FAISS indexes on local disk
- Asyncio for background jobs

### Future Enhancements
- **Horizontal Scaling**: Add load balancer, shared FAISS storage (S3/NFS)
- **Distributed Jobs**: Replace asyncio with Celery + Redis/RabbitMQ
- **Vector Database**: Migrate from FAISS to Pinecone/Weaviate/Milvus
- **Caching**: Add Redis for conversation history and KB query results
- **Monitoring**: Prometheus metrics, Grafana dashboards
- **Authentication**: JWT-based auth with user management

## Security

Current implementation is designed for learning/development:

- ⚠️ No authentication/authorization
- ⚠️ CORS configured for `*` (all origins)
- ⚠️ Database credentials in .env file

**Production recommendations:**
- Implement JWT authentication
- Add role-based access control (RBAC)
- Use secrets management (Vault, AWS Secrets Manager)
- Restrict CORS to specific origins
- Enable HTTPS/TLS
- Add rate limiting
- Implement input validation and sanitization