October 5, 2025v0.2.0

RAG System

Document upload and context-aware chat powered by vector embeddings and retrieval-augmented generation.

The boilerplate now ships with a fully integrated RAG pipeline. Users can upload PDF and plain-text documents, which are automatically chunked, embedded, and stored in Pinecone. When a user asks a question in the AI chat, the system performs a semantic search against the vector index, retrieves the most relevant document fragments, and injects them as context into the LLM prompt before generating a response.

Document processing uses a sliding-window chunking strategy with configurable overlap to preserve context across chunk boundaries. Each chunk is embedded via the configured provider and stored alongside metadata (source filename, page number, chunk index) so that responses can cite their origin. The retrieval step ranks results by cosine similarity and applies a relevance threshold to avoid injecting noise.

This is a minor version bump because RAG represents a major new capability surface. It touches the upload API, background processing, vector storage, and the chat completion pipeline. The architecture is provider-agnostic: swapping Pinecone for another vector store requires changing a single adapter.

Contributors

Sascha Rahn