AI Providers

Kit supports four AI providers through a unified interface. Providers are routed through the Vercel AI Gateway using provider/model strings — switching providers requires only changing an environment variable. This page covers provider setup, model configuration, the default-model catalog, Gateway failover, and capabilities.

Quick Setup

With a single API key in apps/boilerplate/.env.local, both LLM Chat and RAG Chat work immediately. No AI_PROVIDER or AI_MODEL configuration needed — Kit uses sensible defaults.

Available Providers

API Key: OPENAI_API_KEY Default Model: gpt-5-nano Base URL Override: OPENAI_BASE_URL

Model	Context	Input (per 1M)	Output (per 1M)	Best For
`gpt-5.2`	400K	$1.75	$14.00	Latest flagship reasoning
`gpt-5`	400K	$1.25	$10.00	Coding and agentic tasks
`gpt-5-mini`	400K	$0.25	$2.00	Well-defined tasks
`gpt-5-nano`	400K	$0.05	$0.40	Default — fast, cheapest
`gpt-4.1`	1M	$2.00	$8.00	Complex tasks, coding
`gpt-4.1-mini`	1M	$0.40	$1.60	Efficient coding
`o3`	200K	$2.00	$8.00	Deep reasoning
`o4-mini`	200K	$1.10	$4.40	Reasoning on a budget

OpenAI is required for RAG embeddings (text-embedding-3-small), even when using another provider for chat.

Model Selection

You select a model with a provider/model string (e.g. anthropic/claude-haiku-4-5-20251001). The Gateway routes it to the matching provider; switching providers needs no code change. When no model is given, the per-provider default from DEFAULT_MODELS is used. To set the default provider, use AI_PROVIDER (defaults to anthropic):

bash

# Make Anthropic the default provider for unqualified requests
AI_PROVIDER=anthropic

Gateway Routing

Kit routes every chat request through the Vercel AI Gateway. Instead of constructing per-provider SDK clients, the service passes a plain provider/model routing string to streamText / generateText. The model registry (model-registry.ts) is the single source of truth for pricing, reasoning flags, and the routing-string builder:

typescript

// src/lib/ai/model-registry.ts — Gateway routing string
export function toGatewayModelString(
  provider: AIProvider,
  modelId: string
): string {
  return `${provider}/${modelId}`
}

// src/lib/ai/ai-service.ts — routing a request through the Gateway
const provider = resolveProviderForModel(modelId, this._provider)
const gatewayModel = toGatewayModelString(provider, modelId)
const result = streamText({
  model: gatewayModel, // e.g. 'anthropic/claude-haiku-4-5-20251001'
  messages,
  ...buildGatewaySettings({ modelId, isReasoning /* ... */ }),
})

Key behaviors:

One API key for all providers — the SDK reads AI_GATEWAY_API_KEY automatically; no per-provider client construction
Provider resolved from the model — resolveProviderForModel looks up the model in the registry and falls back to the env-configured provider for unknown models
String routing — the Gateway accepts provider/model directly, so no provider import is required

Gateway Fallback

When the primary model is unavailable, the Gateway can fail over to alternative models. This is configured via the optional AI_GATEWAY_FALLBACK_MODELS environment variable (a comma-separated provider/model list). When set, Kit attaches the list to providerOptions.gateway.models:

bash

# Optional: comma-separated provider/model fallbacks
AI_GATEWAY_FALLBACK_MODELS=openai/gpt-5,google/gemini-2.5-flash

typescript

// src/lib/ai/gateway.ts — attaching fallback models (only when configured)
const fallbackModels = getFallbackModels()
return {
  maxOutputTokens: maxTokens ?? (isReasoning ? 16384 : defaultMaxTokens),
  ...(fallbackModels.length > 0 && {
    providerOptions: { gateway: { models: fallbackModels } },
  }),
}

Provider failover is handled server-side by the Gateway, not by a client-side scoring algorithm. If no fallback models are configured, there is no automatic provider switch — the primary model is used directly.

Provider Capabilities

Not all providers support all features. The capability matrix is a reference for choosing a model and configuring Gateway fallbacks:

Capability	OpenAI	Anthropic	Google	xAI
Streaming	Yes	Yes	Yes	Yes
Functions/Tools	Yes	Yes	Yes	Yes
Vision	Yes	Yes	Yes	No
Embeddings	Yes	No	Yes	No
System Messages	Yes	Yes	Yes	Yes
Max Context	1M	200K (1M beta)	1M	2M

RAG search uses OpenAI's embedding model (configurable via AI_EMBEDDING_MODEL, default: text-embedding-3-small), regardless of the active chat provider. Ensure OPENAI_API_KEY is set (or AI_API_KEY as fallback) if you use the RAG system, even with Anthropic or Google as the primary provider.

Reasoning Model Handling

GPT-5 family and o-series models are reasoning models with special parameter constraints. The isReasoningModel flag on ModelInfo controls runtime behavior:

Model	Reasoning	Temperature	Default maxTokens
GPT-5, GPT-5.2, GPT-5 Mini, GPT-5 Nano	Yes	Unsupported	16,384
o3, o4-mini	Yes	Unsupported	16,384
GPT-4.1	No	0.7 (default)	1,000

Key constraints:

Temperature is unsupported — passing it to reasoning models triggers an SDK warning and may degrade behavior. Kit omits temperature entirely for reasoning models.
maxOutputTokens covers BOTH internal reasoning AND visible output — reasoning models use most of the token budget for internal chain-of-thought. The default of 1,000 is far too low; Kit uses 16,384 for reasoning models.
Symptom of too-low maxOutputTokens: finishReason: 'length' with 0 output tokens → empty response to the user.

This logic is centralized in buildGatewaySettings (gateway.ts) — the single place where v6 settings names (maxTokens → maxOutputTokens) and reasoning-safety live:

typescript

// Reasoning-aware parameter handling (gateway.ts → buildGatewaySettings)
const isReasoning = isReasoningModel(modelId)
return {
  ...(isReasoning ? {} : { temperature: temperature ?? defaultTemperature }),
  maxOutputTokens: maxTokens ?? (isReasoning ? 16384 : defaultMaxTokens),
}

Streaming Response Diagnostics

streamText() from the Vercel AI SDK returns lazy Promises that resolve after the stream ends. Kit reads these for diagnostics:

Property	Type	Purpose
`result.finishReason`	`Promise<string>`	Why the stream ended (`'stop'`, `'length'`, `'content_filter'`)
`result.usage`	`Promise<object>`	Token counts (promptTokens, completionTokens)
`result.warnings`	`Warning[]`	Unsupported parameters, model deprecations

finishReason values:

Value	Meaning	Action
`'stop'`	Normal completion	None
`'length'`	Token budget exhausted	Increase `maxTokens`
`'content_filter'`	Provider blocked the response	Review content policy

Kit logs a warning when the stream completes with 0 content chunks or a non-'stop' finish reason. This diagnostic data is essential for debugging empty AI responses.

Custom Base URLs

Each provider supports a custom base URL for proxies, self-hosted models, or alternative endpoints:

Variable	Default	Purpose
`OPENAI_BASE_URL`	`https://api.openai.com/v1`	OpenAI API proxy or compatible endpoint
`ANTHROPIC_BASE_URL`	`https://api.anthropic.com`	Anthropic API proxy
`GOOGLE_AI_BASE_URL`	`https://generativelanguage.googleapis.com/v1`	Google AI proxy
`XAI_BASE_URL`	`https://api.x.ai/v1`	xAI API proxy
`OPENAI_ORG_ID`	—	OpenAI organization ID for billing

Error Handling and Retries

Retries are handled by the Vercel AI SDK. Both streamText and generateText accept a maxRetries setting (default: 2) and apply exponential backoff with jitter automatically. Failed requests on transient errors (network timeouts, 5xx responses) are retried; client errors (400, 401, 403) fail immediately.

typescript

// Retries are configured per call via the AI SDK
streamText({
  model: gatewayModel,
  messages,
  maxRetries: 2, // default; set to 0 to disable
})

Kit's structured error classes (src/lib/ai/errors.ts) wrap provider failures with retryability metadata:

Error Class	When	Retryable
`AIProviderError`	Base class for all provider errors	Varies
`NetworkError`	Connection failures, DNS errors	Yes
`TimeoutError`	Request exceeds timeout	Yes
`ValidationError`	Invalid config or request	No
`InvalidProviderError`	Unknown provider string	No

Quick Setup

Available Providers

Model Selection

Gateway Routing

Gateway Fallback

Provider Capabilities

Reasoning Model Handling

Streaming Response Diagnostics

Custom Base URLs

Error Handling and Retries

Related