Kostenmanagement

Kit schützt dein KI-Budget mit einem dreischichtigen Kostenmanagement-System: globalem Rate-Limiting zum Schutz vor Bursts, einem Credit-basierten System für die Abrechnung pro Vorgang und Usage-Tracking für Analysen und Kostenüberwachung. Alle drei Schichten arbeiten zusammen — jede KI-Anfrage muss jede Schicht passieren, bevor sie den Provider erreicht.

Diese Seite behandelt die Rate-Limiting-Architektur, Credit-Kosten, Usage-Tracking, Token-Kostenberechnung und Feature-Flag-Integration. Für die Abrechnungsintegration des Credit-Systems siehe Credit-System.

Dreischichtige Architektur

Jede KI-Anfrage durchläuft drei Schutzschichten in Folge:

Eingehende KI-Anfrage
    |
    v
Schicht 1: Globales Rate-Limit (Burst-Schutz)
    |--- Upstash Redis sliding window
    |--- 10 requests per 10 seconds (configurable)
    |--- Applies to ALL users regardless of tier
    |--- Purpose: Prevent DDoS and burst abuse
    |--- Fail: 429 "Too many requests. Please slow down."
    |
    v
Schicht 2: Credit-System (Abrechnung pro Vorgang)
    |--- Check credit balance in database
    |--- Verify sufficient credits for operation
    |--- Auto-reset if 30+ days elapsed (webhook backup)
    |--- Atomic deduction BEFORE processing
    |--- Fail: 402 "Insufficient credits"
    |
    v
Schicht 3: Usage-Tracking (Analysen)
    |--- Non-blocking database write (after response)
    |--- Records: provider, model, tokens, cost, purpose
    |--- Monthly aggregation for quota checks
    |--- Used for billing analytics and dashboards
    |
    v
Anfrage erreicht KI-Provider

Rate-Limiting und Credit-Prüfungen verwenden eine Fail-Open-Strategie. Wenn Redis nicht verfügbar ist oder eine Datenbankabfrage fehlschlägt, wird die Anfrage zugelassen und eine Warnung protokolliert. Dies verhindert, dass Infrastrukturprobleme alle Benutzer blockieren. Überwache deine Logs auf ⚠️ Rate limiting disabled-Warnungen.

Rate-Limiting

Globaler Burst-Schutz

Der globale Rate-Limiter verwendet Upstash Redis mit einem Sliding-Window-Algorithmus. Er gilt für alle Benutzer und verhindert Burst-Missbrauch:

bash

# Configure via environment variables
AI_RATE_LIMIT_WINDOW=10      # Window in seconds (default: 10)
AI_RATE_LIMIT_MAX_REQUESTS=10 # Max requests per window (default: 10)

Wesentliche Merkmale:

Sliding Window — gleichmäßiger als feste Fenster, kein Burst an Fenstergrenzen
Ephemeral Cache — In-Memory-Cache reduziert Redis-Aufrufe um 50–80 %
Analysen — Upstash-Analysen für die Überwachung aktiviert

Tier-basierte monatliche Kontingente

Jeder Abonnement-Tier hat ein monatliches Anfragenlimit, das über separate Redis-Rate-Limiter durchgesetzt wird:

Tier	Monatliches Limit	Fenster	Env-Override
Free	500	30 Tage	`AI_FREE_TIER_REQUESTS`
Basic	1.500	30 Tage	`AI_BASIC_TIER_REQUESTS`
Pro	5.000	30 Tage	`AI_PRO_TIER_REQUESTS`
Enterprise	15.000	30 Tage	`AI_ENTERPRISE_TIER_REQUESTS`

Der Tier des Benutzers wird anhand seiner Lemon-Squeezy-Abonnement-Varianten-ID ermittelt. Benutzer ohne Abonnement werden dem Free-Tier zugeordnet.

Rate-Limit-Prüfungsablauf

Die umfassende checkRateLimit-Funktion koordiniert alle drei Prüfungen. Sie wird von jeder KI-API-Route aufgerufen:

src/lib/ai/rate-limiter.ts — Comprehensive Rate Limit Check

export async function checkRateLimit(params: {
  userId?: string
  sessionId?: string
  ip?: string
  cost?: number // Credit cost for operation (default: 1)
}): Promise<{
  success: boolean
  limit: number
  remaining: number
  reset: number
  tier: SubscriptionTier
  reason?: string
  creditSystemEnabled: boolean
}> {
  const { userId, sessionId, ip, cost = 1 } = params

  // Get identifier for burst protection
  const identifier = getIdentifier(userId, sessionId, ip)

  // STEP 1: ALWAYS check global rate limit (DDoS/Burst protection)
  const globalResult = await checkGlobalRateLimit(identifier)
  if (!globalResult.success) {
    return {
      success: false,
      limit: globalResult.limit,
      remaining: globalResult.remaining,
      reset: globalResult.reset,
      tier: 'free',
      reason: 'Too many requests. Please slow down.',
      creditSystemEnabled: isCreditSystemEnabled(),
    }
  }

  // STEP 2: Check if credit system is enabled
  if (!isCreditSystemEnabled()) {
    console.log('[Rate Limit] Credit system disabled - allowing request')
    return {
      success: true,
      limit: 999999, // Unlimited
      remaining: 999999,
      reset: Date.now() + 30 * 24 * 60 * 60 * 1000,
      tier: userId ? await getUserTier(userId) : 'free',
      creditSystemEnabled: false,
    }
  }

Die Funktion gibt ein standardisiertes Ergebnis zurück:

typescript

{
  success: boolean          // true if all checks passed
  limit: number             // Total credit/request limit
  remaining: number         // Remaining credits/requests
  reset: number             // Unix timestamp for reset
  tier: SubscriptionTier    // User's subscription tier
  reason?: string           // Human-readable error message
  creditSystemEnabled: boolean  // Whether credit system is active
}

Jede KI-API-Route MUSS checkRateLimit() vor der Verarbeitung aufrufen. Wird diese Prüfung übersprungen, gibt es keinen Burst-Schutz, keine Credit-Validierung und kein Auto-Reset — dein KI-Budget bleibt ungeschützt.

Credit-Kosten

Jeder KI-Vorgang hat definierte Credit-Kosten. Die Kosten basieren auf dem geschätzten Token-Verbrauch und der Rechenintensität:

src/lib/credits/credit-costs.ts — Operation Costs

export const CREDIT_COSTS = {
  // ============================================================================
  // FAQ Operations
  // ============================================================================

  /**
   * Simple FAQ lookup using RAG (Retrieval-Augmented Generation)
   *
   * Uses vector search with minimal context window. Suitable for
   * straightforward questions with clear answers in the knowledge base.
   *
   * **Estimated tokens**: 500-1000
   *
   * **Example use cases**:
   * - "What are your business hours?"
   * - "How do I reset my password?"
   * - "What payment methods do you accept?"
   */
  faq_simple: 5,

  /**
   * Complex FAQ query with multi-step reasoning
   *
   * Uses larger context window and may require multiple RAG retrievals
   * or chain-of-thought reasoning. Suitable for nuanced questions.
   *
   * **Estimated tokens**: 2000-4000
   *
   * **Example use cases**:
   * - "Compare your pricing plans and recommend one for my use case"
   * - "Explain the difference between your two authentication methods"
   * - "How does your refund policy work for annual subscriptions?"
   */
  faq_complex: 15,

  // ============================================================================
  // Chat Operations
  // ============================================================================

  /**
   * Standard chat message (non-streaming)
   *
   * Single message exchange with context window up to 4000 tokens.
   * Suitable for most conversational interactions.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - General conversation
   * - Question answering
   * - Content suggestions
   */
  chat_message: 15,

  /**
   * Streaming chat message
   *
   * Real-time token streaming with same context as standard messages.
   * Higher cost due to streaming infrastructure and perceived value.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - Interactive chat experiences
   * - Real-time content generation
   * - Live coding assistance
   */
  chat_streaming: 20,

  /**
   * Chat with tool/function calling
   *
   * Chat message that can invoke external tools, APIs, or functions.
   * Includes extra tokens for tool definitions and result processing.
   *
   * **Estimated tokens**: 2000-6000
   *
   * **Example use cases**:
   * - Database queries via chat
   * - API integrations
   * - Calculator or data lookups
   */
  chat_with_tools: 30,

  /**
   * Image analysis in chat (Vision)
   *
   * Multimodal chat message with one or more images for visual analysis.
   * Higher cost due to image processing tokens (images consume ~85 tokens
   * per 512x512 tile in most providers).
   *
   * **Estimated tokens**: 2000-8000 (depends on image resolution)
   *
   * **Example use cases**:
   * - "What's in this image?"
   * - Screenshot analysis and debugging
   * - Document/receipt scanning via chat
   * - Design feedback and comparison
   */
  image_analysis: 30,

  /**
   * PDF document analysis in chat
   *
   * Upload and analyze PDF documents in the LLM Chat.
   * Server-side text extraction with pdf-parse, then AI analysis.
   * Higher cost than streaming due to extraction overhead.
   *
   * **Estimated tokens**: 3000-10000 (depends on document length)
   *
   * **Example use cases**:
   * - "Summarize this contract"
   * - "What are the key terms in this PDF?"
   * - "Extract the action items from this meeting notes PDF"
   */
  pdf_analysis: 40,

  // ============================================================================
  // Advanced AI Operations
  // ============================================================================

  /**
   * Image generation from text prompt
   *
   * Text-to-image generation using models like DALL-E or Stable Diffusion.
   * Highest single-operation cost due to computational requirements.
   *
   * **Estimated tokens**: N/A (GPU-based operation)
   *
   * **Example use cases**:
   * - Marketing visual generation
   * - Product mockups
   * - Concept art creation
   */
  image_gen: 80,

  /**
   * Image editing/manipulation
   *
   * Modify existing images using text prompts or masks.
   * Includes inpainting, outpainting, and style transfer.
   *
   * **Estimated tokens**: N/A (GPU-based operation)
   *
   * **Example use cases**:
   * - Background removal
   * - Object replacement
   * - Image enhancement
   */
  image_edit: 50,

  /**
   * Code analysis and review
   *
   * Static analysis, bug detection, and code quality assessment.
   * Analyzes code structure, patterns, and potential issues.
   *
   * **Estimated tokens**: 3000-8000
   *
   * **Example use cases**:
   * - Security vulnerability scanning
   * - Performance optimization suggestions
   * - Code smell detection
   */
  code_analysis: 40,

  /**
   * Code generation from specifications
   *
   * Generate complete code files or functions from natural language
   * descriptions. Includes language-specific syntax and best practices.
   *
   * **Estimated tokens**: 4000-10000
   *
   * **Example use cases**:
   * - Component scaffolding
   * - API endpoint generation
   * - Test case creation
   */
  code_gen: 50,

  // ============================================================================
  // Embeddings and Vector Operations
  // ============================================================================

  /**
   * Single text embedding generation
   *
   * Convert text to vector representation for semantic search.
   * Typically 1536-dimensional vector (OpenAI ada-002).
   *
   * **Estimated tokens**: 100-500
   *
   * **Example use cases**:
   * - Document indexing
   * - Semantic search preparation
   * - Content similarity calculation
   */
  embedding_single: 5,

  /**
   * Batch embedding generation
   *
   * Process multiple texts in a single batch operation.
   * More efficient than individual embeddings for bulk operations.
   *
   * **Estimated tokens**: 1000-5000
   *
   * **Example use cases**:
   * - Bulk document processing
   * - Knowledge base initialization
   * - Large-scale content indexing
   */
  embedding_batch: 10,

  /**
   * Vector similarity search
   *
   * Query vector database to find semantically similar content.
   * Cost covers embedding query text and database lookup.
   *
   * **Estimated tokens**: 200-800
   *
   * **Example use cases**:
   * - Semantic document search
   * - Recommendation systems
   * - Duplicate content detection
   */
  vector_search: 5,

  // ============================================================================
  // Audio Operations
  // ============================================================================

  /**
   * Audio transcription (speech-to-text)
   *
   * Convert audio files to text using Whisper or similar models.
   * Cost per minute of audio content.
   *
   * **Estimated tokens**: N/A (audio processing)
   *
   * **Example use cases**:
   * - Meeting transcription
   * - Podcast notes generation
   * - Voice command processing
   */
  transcription: 30,

  /**
   * Speech-to-text for chat voice input
   *
   * Short audio recordings from microphone input in LLM Chat,
   * transcribed via OpenAI Whisper. Lower cost than general transcription
   * because chat recordings are typically shorter (max 120s).
   *
   * **Estimated tokens**: N/A (audio processing)
   *
   * **Example use cases**:
   * - Voice input in chat (microphone button)
   * - Quick voice messages for AI conversation
   */
  speech_to_text: 20,

  /**
   * Text-to-speech synthesis
   *
   * Generate natural-sounding audio from text input.
   * Includes voice selection and audio quality options.
   *
   * **Estimated tokens**: N/A (audio synthesis)
   *
   * **Example use cases**:
   * - Voiceover generation
   * - Accessibility features
   * - Audio content creation
   */
  tts: 20,

  // ============================================================================
  // Document Processing
  // ============================================================================

  /**
   * PDF parsing and text extraction
   *
   * Extract text, tables, and metadata from PDF documents.
   * Handles multi-page documents with layout preservation.
   *
   * **Estimated tokens**: 1000-3000
   *
   * **Example use cases**:
   * - Document digitization
   * - Invoice processing
   * - Contract analysis
   */
  pdf_parse: 15,

  /**
   * Optical Character Recognition (OCR)
   *
   * Extract text from images and scanned documents.
   * Includes text detection, recognition, and layout analysis.
   *
   * **Estimated tokens**: N/A (image processing)
   *
   * **Example use cases**:
   * - Receipt scanning
   * - Handwriting recognition
   * - Screenshot text extraction
   */
  ocr: 30,

  /**
   * Document summarization
   *
   * Generate concise summaries of long documents.
   * Uses extractive or abstractive summarization techniques.
   *
   * **Estimated tokens**: 5000-12000
   *
   * **Example use cases**:
   * - Research paper summaries
   * - Meeting notes condensation
   * - Article key points extraction
   */
  document_summary: 65,

  // ============================================================================
  // Content Generation
  // ============================================================================

  /**
   * Template-based content generation
   *
   * Generate text from templates (email, product description, blog outline,
   * social media post, marketing copy) with streaming output.
   * Cost covers template processing + text generation.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - Professional email drafting
   * - Product description writing
   * - Blog post outline generation
   * - Social media post creation
   * - Marketing copy generation
   */
  content_generation: 25,

Vollständige Kostentabelle

Vorgang	Credits	Geschätzte Tokens	Kategorie
`faq_simple`	5	500–1.000	FAQ
`faq_complex`	15	2.000–4.000	FAQ
`chat_message`	15	1.000–4.000	Chat
`chat_streaming`	20	1.000–4.000	Chat
`content_generation`	25	1.000–4.000	Content
`chat_with_tools`	30	2.000–6.000	Chat
`image_analysis`	30	2.000–8.000	Chat
`pdf_analysis`	40	3.000–10.000	Chat
`image_gen`	80	N/A (GPU)	Erweitertes KI
`image_edit`	50	N/A (GPU)	Erweitertes KI
`code_analysis`	40	3.000–8.000	Erweitertes KI
`code_gen`	50	4.000–10.000	Erweitertes KI
`embedding_single`	5	100–500	Embeddings
`embedding_batch`	10	1.000–5.000	Embeddings
`vector_search`	5	200–800	Embeddings
`transcription`	30	N/A (Audio)	Audio
`tts`	20	N/A (Audio)	Audio
`speech_to_text`	20	N/A (Audio)	Audio
`pdf_parse`	15	1.000–3.000	Dokument
`ocr`	30	N/A (Bild)	Dokument
`document_summary`	65	5.000–12.000	Dokument

Füge deine eigenen Vorgänge zu CREDIT_COSTS in apps/boilerplate/src/lib/credits/credit-costs.ts hinzu. Das Typsystem generiert automatisch einen CreditOperation-Union-Typ aus den Objekt-Schlüsseln — neue Vorgänge erhalten automatisch Typsicherheit.

Credit-Kosten-Hilfsfunktionen

Das Credit-Kosten-Modul bietet Hilfsfunktionen:

Funktion	Zweck
`getCreditCost(operation)`	Kosten für einen einzelnen Vorgang abrufen
`calculateBatchCost(operation, quantity)`	Gesamtkosten für Batch-Vorgänge berechnen
`getAllCreditCosts()`	Alle Kosten als einfaches Objekt abrufen (für Admin-UI)
`isValidOperation(string)`	Type-Guard — prüfen, ob ein String ein gültiger Vorgang ist
`getOperationsByCategory()`	Vorgänge nach Kategorie gruppieren (faq, chat, etc.)
`estimateOperationCount(operation, credits)`	Wie viele Vorgänge sind mit X Credits möglich?
`formatCreditAmount(credits, includeUnit)`	Für die Anzeige formatieren (`20` → `"20 credits"`)

Usage-Tracking

Jede KI-Anfrage wird in der AIUsage-Datenbanktabelle für Analysen und Kostenüberwachung erfasst. Das Tracking ist nicht-blockierend — Fehler werden protokolliert, verhindern aber nie die Auslieferung der KI-Antwort.

TrackUsageParams

typescript

interface TrackUsageParams {
  userId?: string
  sessionId?: string
  provider: string        // "openai", "anthropic", etc.
  model: string           // "gpt-5-nano", "claude-haiku", etc.
  tokens: number          // Total tokens used
  cost?: TokenCost | number  // USD cost (from provider pricing)
  purpose: 'faq' | 'chat' | 'completion' | 'stream' | 'embedding' | 'general'
  metadata?: Record<string, unknown>  // Additional context
}

Monatliche Aggregation

Die Nutzung wird monatlich für Kontingentprüfungen und Analysen aggregiert:

typescript

interface MonthlyUsage {
  totalTokens: number
  totalCost: number
  requestCount: number
  byProvider: Record<string, { tokens, cost, requests }>
  byPurpose: Record<string, { tokens, cost, requests }>
}

Usage-API

Endpunkt	Methode	Zweck
`/api/ai/usage`	GET	Nutzungsstatistiken des aktuellen Monats

Der Usage-Endpunkt gibt aggregierte Daten aufgeschlüsselt nach Provider und Zweck zurück, geeignet für Dashboard-Diagramme und Nutzungsmesser.

Token-Kostenberechnung

Kit berechnet die USD-Kosten jeder Anfrage anhand von modellspezifischen Preistabellen. Die calculateCost-Funktion in der Modell-Registry rechnet Token-Anzahlen in Dollar-Beträge um:

src/lib/ai/model-registry.ts — Cost Calculation

export function calculateCost(usage: TokenUsage, modelId?: string): TokenCost {
  const modelInfo = modelId ? getModelInfo(modelId) : null
  if (!modelInfo) {
    return {
      promptCost: 0,
      completionCost: 0,
      totalCost: 0,
      currency: 'USD',
    }
  }

  const promptCost =
    (usage.promptTokens / 1_000_000) * modelInfo.costPerMillionPromptTokens
  const completionCost =
    (usage.completionTokens / 1_000_000) *
    modelInfo.costPerMillionCompletionTokens

  return {
    promptCost,
    completionCost,
    totalCost: promptCost + completionCost,
    currency: 'USD',
  }
}

Die Berechnung verwendet costPerMillionPromptTokens und costPerMillionCompletionTokens aus der Modellinfo-Registry. Das ermöglicht genaues Kosten-Tracking über alle vier Provider hinweg.

Beispiel: Eine Anfrage mit claude-haiku-4-5 mit 500 Prompt-Tokens und 200 Completion-Tokens:

Prompt-Kosten:     (500 / 1.000.000) × $0,80 = $0,000400
Completion-Kosten: (200 / 1.000.000) × $4,00 = $0,000800
Gesamtkosten:      $0,001200

Feature-Flag-Integration

Das Kostenmanagement-System verhält sich unterschiedlich, je nachdem, ob das Credit-System aktiviert ist:

Verhalten	Credit-System AN	Credit-System AUS
Rate-Limiting	Globaler Burst + Credit-Guthaben-Prüfung	Nur globaler Burst
Credit-Abzug	Atomarer Abzug vor der Verarbeitung	Übersprungen
Usage-Tracking	Vollständiges Tracking mit Kosten	Vollständiges Tracking (nur Analysen)
Monatliches Limit	Basiert auf Credit-Guthaben	Unbegrenzt (999999)
402-Fehler	„Insufficient credits"	Werden nie gesendet
Auto-Reset	Prüft, ob 30+ Tage vergangen sind	Übersprungen

Wenn NEXT_PUBLIC_PRICING_MODEL=classic_saas, ist das Credit-System deaktiviert. Benutzer erhalten unbegrenzte KI-Anfragen (keine Credit-Prüfungen), aber der globale Burst-Schutz bleibt aktiv. Das Usage-Tracking wird weiterhin für Analysezwecke durchgeführt. Dies ist für traditionelle Abonnementmodelle geeignet, bei denen KI ein enthaltenes Feature und keine abrechenbare Ressource ist.

Die Feature-Flag-Prüfung findet innerhalb von checkRateLimit statt:

checkRateLimit()
    |
    |--- IMMER: Globales Rate-Limit prüfen
    |
    |--- isCreditSystemEnabled()?
    |    |
    |    |--- JA:  Credit-Guthaben prüfen, Auto-Reset, abziehen
    |    |--- NEIN: Anfrage zulassen (protokolliere "Credit system disabled")

Umgebungsvariablen

Variable	Standard	Zweck
`AI_RATE_LIMIT_WINDOW`	`10`	Globales Rate-Limit-Fenster in Sekunden
`AI_RATE_LIMIT_MAX_REQUESTS`	`10`	Maximale Anfragen pro globalem Fenster
`AI_FREE_TIER_REQUESTS`	`500`	Monatliches Limit für den Free-Tier
`AI_BASIC_TIER_REQUESTS`	`1500`	Monatliches Limit für den Basic-Tier
`AI_PRO_TIER_REQUESTS`	`5000`	Monatliches Limit für den Pro-Tier
`AI_ENTERPRISE_TIER_REQUESTS`	`15000`	Monatliches Limit für den Enterprise-Tier
`UPSTASH_REDIS_REST_URL`	—	Redis-URL für Rate-Limiting
`UPSTASH_REDIS_REST_TOKEN`	—	Redis-Token für Rate-Limiting
`NEXT_PUBLIC_PRICING_MODEL`	`credit_based`	`credit_based` oder `classic_saas`

Wichtige Dateien

Datei	Zweck
`apps/boilerplate/src/lib/ai/rate-limiter.ts`	Globales Burst- und Tier-basiertes Rate-Limiting
`apps/boilerplate/src/lib/credits/credit-costs.ts`	Credit-Kosten pro Vorgang (21 Vorgänge)
`apps/boilerplate/src/lib/credits/credit-manager.ts`	Atomare Credit-Abzüge mit `SELECT FOR UPDATE`
`apps/boilerplate/src/lib/credits/config.ts`	Credit-System-Feature-Flag (`isCreditSystemEnabled()`)
`apps/boilerplate/src/lib/ai/usage-tracker.ts`	Nicht-blockierendes Usage-Tracking in die Datenbank
`apps/boilerplate/src/lib/ai/providers/base-provider.ts`	Token-Kostenberechnung pro Provider
`apps/boilerplate/src/app/api/ai/usage/route.ts`	Nutzungsstatistiken-Endpunkt

Dreischichtige Architektur

Rate-Limiting

Globaler Burst-Schutz

Tier-basierte monatliche Kontingente

Rate-Limit-Prüfungsablauf

Credit-Kosten

Vollständige Kostentabelle

Credit-Kosten-Hilfsfunktionen

Usage-Tracking

TrackUsageParams

Monatliche Aggregation

Usage-API

Token-Kostenberechnung

Feature-Flag-Integration

Umgebungsvariablen

Wichtige Dateien

Related