Kit ships with a production-ready AI system that supports four LLM providers (Anthropic, OpenAI, Google, xAI) routed through the Vercel AI Gateway. The system includes two chat modes, streaming SSE responses, pgvector-powered RAG, and a three-layer cost management system.
This page covers the architecture and core concepts. For provider setup, see AI Providers. For the chat system, see Chat System. For knowledge base search, see RAG System. For rate limiting and credits, see Cost Management.
How It Works
Every AI request in Kit flows through the same pipeline — from the React hook to the provider and back:
User types message
|
v
React Hook (useAIChat / useAICompletion)
|--- Manages message history
|--- Handles streaming state
|--- Triggers credit animation
|
v
API Route (/api/ai/stream or /api/ai/chat)
|--- 1. Feature guard (is chat mode enabled?)
|--- 2. Authentication (Clerk → DB user)
|--- 3. Rate limit check (global burst + credit balance)
|--- 4. Credit deduction (BEFORE processing)
|--- 5. Zod request validation
|
v
AI Service (ai-service.ts)
|--- Resolves the model (AI_MODEL or DEFAULT_MODELS)
|--- Builds the provider/model routing string (toGatewayModelString)
|--- Calls the SDK with the Gateway-routed model
|
v
Vercel AI Gateway (provider/model string)
|--- Routes to the target provider (OpenAI / Anthropic / Google / xAI)
|--- Applies server-side failover (AI_GATEWAY_FALLBACK_MODELS)
|--- Streams response chunks via SSE
|
v
Response flows back
|--- Usage tracked to database (non-blocking)
|--- Credit balance invalidated in TanStack Query cache
|--- Message displayed in chat UI
Provider Architecture
Kit routes AI providers through the Vercel AI Gateway using
provider/model strings, so switching providers requires zero code changes — only an environment variable update.You select a model with a
provider/model string (e.g. anthropic/claude-haiku-4-5-20251001), and the Gateway routes it to the matching provider. Each provider has a preconfigured default model, optimized for cost-efficiency:src/lib/ai/config.ts — Default Models
export const DEFAULT_MODELS: Record<AIProvider, string> = {
openai: 'gpt-5-nano',
anthropic: 'claude-haiku-4-5-20251001',
google: 'gemini-2.5-flash',
xai: 'grok-4-1-fast-reasoning',
}
| Provider | Default Model | Context Window | Best For |
|---|---|---|---|
| Anthropic | claude-haiku-4-5 | 200K tokens (1M beta) | Primary — nuanced reasoning, long context |
| OpenAI | gpt-5-nano | 400K tokens | General purpose, RAG embeddings |
gemini-2.5-flash | 1M tokens | Large documents, cost efficiency | |
| xAI | grok-4-1-fast-reasoning | 2M tokens | Real-time data, conversational |
Kit uses the Vercel AI SDK (
ai v6.x) for embeddings and the useCompletion hook. The streaming chat (useAIChat) uses a custom SSE parser that supports five response formats across all providers — this provides more robust multi-provider support than the SDK alone.Two Chat Modes
Kit provides two distinct chat experiences, each with its own route, API, and UI:
| Aspect | LLM Chat | RAG Chat |
|---|---|---|
| Route | /dashboard/chat-llm | /dashboard/chat-rag |
| API | /api/ai/stream, /api/ai/chat | /api/ai/rag/ask |
| Hook | useAIChat() | Custom RAG hook |
| Context | Direct LLM conversation | Knowledge base + LLM |
| Feature Flag | NEXT_PUBLIC_AI_LLM_CHAT_ENABLED | NEXT_PUBLIC_AI_RAG_CHAT_ENABLED |
| Token Usage | Full conversation history | ~3-5K tokens (RAG context) |
| Best For | Open-ended conversation, coding help | Product support, FAQ |
Both chat modes are enabled by default. Set the corresponding environment variable to
false to disable either mode. The navigation automatically hides disabled chat modes.Feature Flags
Seven environment variables control which AI features are available. All default to
true (enabled):src/lib/ai/feature-flags.ts — Feature Configuration
export const AI_CHAT_FEATURES = {
/**
* RAG Chat (Modern UI)
* Routes: /dashboard/chat-rag, /api/ai/rag/*
* Features: Modern chat UI, Knowledge Base integration, Source Attribution
*/
ragChat: process.env.NEXT_PUBLIC_AI_RAG_CHAT_ENABLED !== 'false',
/**
* LLM Chat (Direct Chat)
* Routes: /dashboard/chat-llm, /api/ai/chat, /api/ai/stream
* Features: Modern chat UI, Direct LLM conversation, Streaming
*/
llmChat: process.env.NEXT_PUBLIC_AI_LLM_CHAT_ENABLED !== 'false',
/**
* Vision Chat (Image Analysis in LLM Chat)
* Extends LLM Chat with image upload and analysis capabilities.
* Requires LLM Chat to be enabled. Only active when BOTH flags are true.
* Features: Drag & Drop, Paste, File picker, Base64 image transport
*/
visionChat:
process.env.NEXT_PUBLIC_AI_LLM_CHAT_ENABLED !== 'false' &&
process.env.NEXT_PUBLIC_AI_VISION_ENABLED !== 'false',
/**
* PDF Chat (Document Analysis in LLM Chat)
* Extends LLM Chat with PDF upload and text extraction capabilities.
* Requires LLM Chat to be enabled. Only active when BOTH flags are true.
* Features: Drag & Drop, File picker, server-side text extraction, all providers
*/
pdfChat:
process.env.NEXT_PUBLIC_AI_LLM_CHAT_ENABLED !== 'false' &&
process.env.NEXT_PUBLIC_AI_PDF_CHAT_ENABLED !== 'false',
/**
* Audio Input (Speech-to-Text for all AI Chats)
* Adds microphone recording and Whisper transcription to any AI input field.
* Standalone feature — works with LLM Chat, RAG Chat, and Image Gen.
* Features: MediaRecorder, Whisper STT, editable transcript in input field
*/
audioInput: process.env.NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED !== 'false',
/**
* Image Generation (Text-to-Image)
* Routes: /dashboard/image-gen, /api/ai/image-gen
* Features: GPT Image models, multiple sizes/qualities/formats, transparent backgrounds
* Standalone feature — does NOT require LLM Chat to be enabled.
*/
imageGen: process.env.NEXT_PUBLIC_AI_IMAGE_GEN_ENABLED !== 'false',
/**
* Content Generator (Template-based Text Generation)
* Routes: /dashboard/content, /api/ai/generate-content
* Features: 5 templates (Email, Product, Blog, Social, Marketing), tone/language/length controls, streaming output
* Standalone feature — does NOT require LLM Chat to be enabled.
*/
contentGen: process.env.NEXT_PUBLIC_AI_CONTENT_GEN_ENABLED !== 'false',
} as const
| Variable | Default | Controls |
|---|---|---|
NEXT_PUBLIC_AI_RAG_CHAT_ENABLED | true | RAG Chat on /dashboard/chat-rag |
NEXT_PUBLIC_AI_LLM_CHAT_ENABLED | true | LLM Chat on /dashboard/chat-llm |
NEXT_PUBLIC_AI_VISION_ENABLED | true | Image analysis in LLM Chat (requires LLM Chat enabled) |
NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED | true | Voice input via speech-to-text in LLM Chat (requires LLM Chat enabled) |
NEXT_PUBLIC_AI_PDF_CHAT_ENABLED | true | PDF analysis in LLM Chat (requires LLM Chat enabled) |
NEXT_PUBLIC_AI_IMAGE_GEN_ENABLED | true | Image Generation on /dashboard/image-gen |
NEXT_PUBLIC_AI_CONTENT_GEN_ENABLED | true | Content Generator on /dashboard/content |
When Vision Chat is enabled, users can attach images to LLM Chat messages via drag & drop, clipboard paste, or file picker. Images are sent as
ContentPart[] (Base64 data URIs) to /api/ai/stream, which auto-selects the image_analysis credit operation (30 credits). See Chat System for details.When Audio Input is enabled, a microphone button appears in the LLM Chat input area. Users can record voice messages (up to 120 seconds) which are transcribed via the Whisper API at
/api/ai/speech-to-text (20 credits per transcription). The transcribed text is inserted into the chat input field. See Chat System for details.When Image Generation is enabled, the
/dashboard/image-gen route provides a text-to-image interface using OpenAI's GPT Image models (gpt-image-1, gpt-image-1.5, gpt-image-1-mini). Users can configure size, quality, format, and background transparency. Generated images are stored in session history (up to 10 entries). Unlike chat features, Image Generation is a standalone feature — it does NOT require LLM Chat to be enabled.When Content Generator is enabled, the
/dashboard/content route provides a template-based text generation interface with five templates (email, product description, blog outline, social media, marketing copy). Users can configure tone, language, and length. The generator uses SSE streaming to deliver results progressively. Like Image Generation, the Content Generator is a standalone feature — it does NOT require LLM Chat to be enabled.Feature flags are checked at two levels:
- Page level —
shouldShowRAGChat()/shouldShowLLMChat()/shouldShowImageGen()/shouldShowContentGen()guard functions callnotFound()if disabled - API level —
guardRAGChat()/guardLLMChat()/guardAudioInput()/guardImageGen()/guardContentGen()return 404 responses for disabled features
Directory Structure
All AI-related code lives in
apps/boilerplate/src/lib/ai/ with API routes in apps/boilerplate/src/app/api/ai/:apps/boilerplate/src/
├── lib/
│ └── ai/
│ ├── config.ts # Default models + OpenAI key resolver
│ ├── types.ts # Shared TypeScript types (Message, Provider, etc.)
│ ├── feature-flags.ts # AI_CHAT_FEATURES, guard functions
│ ├── route-guards.ts # API + page guards for feature flags
│ ├── ai-service.ts # High-level service (routes via the AI Gateway)
│ ├── gateway.ts # Gateway adapter — settings, reasoning-safety, fallback
│ ├── model-registry.ts # Model catalog, pricing, reasoning flags, routing strings
│ ├── rag-service.ts # RAG pipeline (search → context → answer)
│ ├── rag-search.ts # pgvector similarity search
│ ├── rate-limiter.ts # Global burst + tier-based limiting
│ ├── usage-tracker.ts # Token/cost tracking to database
│ ├── image-gen/
│ │ ├── config.ts # Model configs, sizes, quality options
│ │ ├── service.ts # OpenAI image generation service
│ │ └── types.ts # Image generation TypeScript types
│ ├── content-gen/
│ │ ├── config.ts # Template definitions, prompt builder, UI labels
│ │ ├── service.ts # Content generation AI service wrapper
│ │ └── types.ts # Content generator TypeScript types
│ ├── sse-parser.ts # Shared SSE stream parser with error handling
│ ├── quick-prompts.ts # Configurable suggestion buttons
│ └── errors.ts # Error class hierarchy
├── hooks/
│ ├── use-ai.ts # React hooks (useAIChat, useAICompletion, etc.)
│ ├── use-image-gen.ts # Image generation hook with history
│ ├── use-content-generator.ts # Content generator hook with SSE streaming
│ └── use-audio-recorder.ts # Audio recording hook (MediaRecorder API)
├── app/
│ └── api/
│ └── ai/
│ ├── stream/route.ts # POST — SSE streaming endpoint
│ ├── chat/route.ts # POST — Synchronous chat endpoint
│ ├── speech-to-text/route.ts # POST — Audio transcription (Whisper)
│ ├── image-gen/route.ts # POST — Image generation endpoint
│ ├── generate-content/route.ts # POST — Content generation endpoint
│ ├── usage/route.ts # GET — Usage statistics endpoint
│ └── rag/
│ ├── ask/route.ts # POST — RAG question answering
│ └── conversations/ # CRUD for conversation history
└── components/
└── ai/
├── chat/ # Chat UI components (12 components)
├── image-gen/ # Image generation UI (4 components)
└── content-gen/ # Content generator UI (5 components)
Environment Variables
| Variable | Required | Purpose |
|---|---|---|
AI_GATEWAY_API_KEY | Yes* | Vercel AI Gateway key — one key for all providers on the chat path (provider/model routing). Keyless OIDC works on Vercel. |
AI_PROVIDER | No | Default provider slug for Gateway routing (openai, anthropic, google, xai; default anthropic) |
AI_MODEL | No | Override the default model (otherwise DEFAULT_MODELS[AI_PROVIDER]) |
AI_GATEWAY_FALLBACK_MODELS | No | Comma-separated provider/model list for Gateway-side failover |
OPENAI_API_KEY | Yes† | OpenAI key for the OpenAI-direct paths (RAG embeddings, Whisper STT, image generation) |
AI_API_KEY | No | Fallback for the OpenAI-direct paths when AI_PROVIDER=openai (not read on the chat path) |
AI_EMBEDDING_MODEL | No | Embedding model for RAG (default: text-embedding-3-small) |
NEXT_PUBLIC_AI_RAG_CHAT_ENABLED | No | Enable RAG Chat (default: true) |
NEXT_PUBLIC_AI_LLM_CHAT_ENABLED | No | Enable LLM Chat (default: true) |
NEXT_PUBLIC_AI_VISION_ENABLED | No | Enable image analysis in LLM Chat (default: true) |
NEXT_PUBLIC_AI_AUDIO_INPUT_ENABLED | No | Enable voice input in LLM Chat (default: true) |
NEXT_PUBLIC_AI_PDF_CHAT_ENABLED | No | Enable PDF analysis in LLM Chat (default: true) |
NEXT_PUBLIC_AI_IMAGE_GEN_ENABLED | No | Enable Image Generation (default: true) |
NEXT_PUBLIC_AI_CONTENT_GEN_ENABLED | No | Enable Content Generator (default: true) |
UPSTASH_REDIS_REST_URL | No | Redis URL for rate limiting |
UPSTASH_REDIS_REST_TOKEN | No | Redis token for rate limiting |
*Chat requires
AI_GATEWAY_API_KEY (or keyless OIDC on Vercel). †OPENAI_API_KEY is required only for the OpenAI-direct paths (RAG embeddings, Whisper STT, image generation). To run chat on your own provider credentials, use Vercel BYOK in the Gateway dashboard instead of per-provider env vars.To get chat working you only need one key:
AI_GATEWAY_API_KEY (or keyless OIDC on Vercel). It routes every provider through the Vercel AI Gateway — pick the provider with AI_PROVIDER and a model with AI_MODEL. For RAG Chat, audio input, or Image Generation, also add OPENAI_API_KEY — these OpenAI-direct paths call OpenAI directly (embeddings use text-embedding-3-small). Set AI_GATEWAY_FALLBACK_MODELS for automatic Gateway failover, and use Vercel BYOK to bring your own provider credentials.Key Files
| File | Purpose |
|---|---|
apps/boilerplate/src/lib/ai/config.ts | Default model catalog + OpenAI key resolver (chat runs gateway-only) |
apps/boilerplate/src/lib/ai/feature-flags.ts | Feature flag definitions and guard functions |
apps/boilerplate/src/lib/ai/ai-service.ts | High-level AI service (routes via the AI Gateway, calculates costs) |
apps/boilerplate/src/lib/ai/gateway.ts | Gateway adapter — request settings, reasoning-safety, fallback models |
apps/boilerplate/src/lib/ai/model-registry.ts | Model catalog — pricing, reasoning flags, Gateway routing strings |
apps/boilerplate/src/lib/ai/rag-service.ts | RAG pipeline — query rewriting, search, context assembly, answer generation |
apps/boilerplate/src/lib/ai/rag-search.ts | pgvector similarity search with OpenAI embeddings |
apps/boilerplate/src/lib/ai/rate-limiter.ts | Two-layer rate limiting (global burst, tier-based) |
apps/boilerplate/src/lib/credits/credit-costs.ts | Per-operation credit costs (21 operation types) |
apps/boilerplate/src/hooks/use-ai.ts | React hooks — useAIChat, useAICompletion, useAIQuery, useAIStream |
apps/boilerplate/src/app/api/ai/stream/route.ts | SSE streaming endpoint with full cost management pipeline |