Kostenmanagement

Dreischichtige Kostenkontrolle — Rate-Limiting, Credit-System und Usage-Tracking für KI-Vorgänge

Kit schützt dein KI-Budget mit einem dreischichtigen Kostenmanagement-System: globalem Rate-Limiting zum Schutz vor Bursts, einem Credit-basierten System für die Abrechnung pro Vorgang und Usage-Tracking für Analysen und Kostenüberwachung. Alle drei Schichten arbeiten zusammen — jede KI-Anfrage muss jede Schicht passieren, bevor sie den Provider erreicht.
Diese Seite behandelt die Rate-Limiting-Architektur, Credit-Kosten, Usage-Tracking, Token-Kostenberechnung und Feature-Flag-Integration. Für die Abrechnungsintegration des Credit-Systems siehe Credit-System.

Dreischichtige Architektur

Jede KI-Anfrage durchläuft drei Schutzschichten in Folge:
Eingehende KI-Anfrage
    |
    v
Schicht 1: Globales Rate-Limit (Burst-Schutz)
    |--- Upstash Redis sliding window
    |--- 10 requests per 10 seconds (configurable)
    |--- Applies to ALL users regardless of tier
    |--- Purpose: Prevent DDoS and burst abuse
    |--- Fail: 429 "Too many requests. Please slow down."
    |
    v
Schicht 2: Credit-System (Abrechnung pro Vorgang)
    |--- Check credit balance in database
    |--- Verify sufficient credits for operation
    |--- Auto-reset if 30+ days elapsed (webhook backup)
    |--- Atomic deduction BEFORE processing
    |--- Fail: 402 "Insufficient credits"
    |
    v
Schicht 3: Usage-Tracking (Analysen)
    |--- Non-blocking database write (after response)
    |--- Records: provider, model, tokens, cost, purpose
    |--- Monthly aggregation for quota checks
    |--- Used for billing analytics and dashboards
    |
    v
Anfrage erreicht KI-Provider

Rate-Limiting

Globaler Burst-Schutz

Der globale Rate-Limiter verwendet Upstash Redis mit einem Sliding-Window-Algorithmus. Er gilt für alle Benutzer und verhindert Burst-Missbrauch:
bash
# Configure via environment variables
AI_RATE_LIMIT_WINDOW=10      # Window in seconds (default: 10)
AI_RATE_LIMIT_MAX_REQUESTS=10 # Max requests per window (default: 10)
Wesentliche Merkmale:
  • Sliding Window — gleichmäßiger als feste Fenster, kein Burst an Fenstergrenzen
  • Ephemeral Cache — In-Memory-Cache reduziert Redis-Aufrufe um 50–80 %
  • Analysen — Upstash-Analysen für die Überwachung aktiviert

Tier-basierte monatliche Kontingente

Jeder Abonnement-Tier hat ein monatliches Anfragenlimit, das über separate Redis-Rate-Limiter durchgesetzt wird:
TierMonatliches LimitFensterEnv-Override
Free50030 TageAI_FREE_TIER_REQUESTS
Basic1.50030 TageAI_BASIC_TIER_REQUESTS
Pro5.00030 TageAI_PRO_TIER_REQUESTS
Enterprise15.00030 TageAI_ENTERPRISE_TIER_REQUESTS
Der Tier des Benutzers wird anhand seiner Lemon-Squeezy-Abonnement-Varianten-ID ermittelt. Benutzer ohne Abonnement werden dem Free-Tier zugeordnet.

Rate-Limit-Prüfungsablauf

Die umfassende checkRateLimit-Funktion koordiniert alle drei Prüfungen. Sie wird von jeder KI-API-Route aufgerufen:
src/lib/ai/rate-limiter.ts — Comprehensive Rate Limit Check
export async function checkRateLimit(params: {
  userId?: string
  sessionId?: string
  ip?: string
  cost?: number // Credit cost for operation (default: 1)
}): Promise<{
  success: boolean
  limit: number
  remaining: number
  reset: number
  tier: SubscriptionTier
  reason?: string
  creditSystemEnabled: boolean
}> {
  const { userId, sessionId, ip, cost = 1 } = params

  // Get identifier for burst protection
  const identifier = getIdentifier(userId, sessionId, ip)

  // STEP 1: ALWAYS check global rate limit (DDoS/Burst protection)
  const globalResult = await checkGlobalRateLimit(identifier)
  if (!globalResult.success) {
    return {
      success: false,
      limit: globalResult.limit,
      remaining: globalResult.remaining,
      reset: globalResult.reset,
      tier: 'free',
      reason: 'Too many requests. Please slow down.',
      creditSystemEnabled: isCreditSystemEnabled(),
    }
  }

  // STEP 2: Check if credit system is enabled
  if (!isCreditSystemEnabled()) {
    console.log('[Rate Limit] Credit system disabled - allowing request')
    return {
      success: true,
      limit: 999999, // Unlimited
      remaining: 999999,
      reset: Date.now() + 30 * 24 * 60 * 60 * 1000,
      tier: userId ? await getUserTier(userId) : 'free',
      creditSystemEnabled: false,
    }
  }
Die Funktion gibt ein standardisiertes Ergebnis zurück:
typescript
{
  success: boolean          // true if all checks passed
  limit: number             // Total credit/request limit
  remaining: number         // Remaining credits/requests
  reset: number             // Unix timestamp for reset
  tier: SubscriptionTier    // User's subscription tier
  reason?: string           // Human-readable error message
  creditSystemEnabled: boolean  // Whether credit system is active
}

Credit-Kosten

Jeder KI-Vorgang hat definierte Credit-Kosten. Die Kosten basieren auf dem geschätzten Token-Verbrauch und der Rechenintensität:
src/lib/credits/credit-costs.ts — Operation Costs
export const CREDIT_COSTS = {
  // ============================================================================
  // FAQ Operations
  // ============================================================================

  /**
   * Simple FAQ lookup using RAG (Retrieval-Augmented Generation)
   *
   * Uses vector search with minimal context window. Suitable for
   * straightforward questions with clear answers in the knowledge base.
   *
   * **Estimated tokens**: 500-1000
   *
   * **Example use cases**:
   * - "What are your business hours?"
   * - "How do I reset my password?"
   * - "What payment methods do you accept?"
   */
  faq_simple: 5,

  /**
   * Complex FAQ query with multi-step reasoning
   *
   * Uses larger context window and may require multiple RAG retrievals
   * or chain-of-thought reasoning. Suitable for nuanced questions.
   *
   * **Estimated tokens**: 2000-4000
   *
   * **Example use cases**:
   * - "Compare your pricing plans and recommend one for my use case"
   * - "Explain the difference between your two authentication methods"
   * - "How does your refund policy work for annual subscriptions?"
   */
  faq_complex: 15,

  // ============================================================================
  // Chat Operations
  // ============================================================================

  /**
   * Standard chat message (non-streaming)
   *
   * Single message exchange with context window up to 4000 tokens.
   * Suitable for most conversational interactions.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - General conversation
   * - Question answering
   * - Content suggestions
   */
  chat_message: 15,

  /**
   * Streaming chat message
   *
   * Real-time token streaming with same context as standard messages.
   * Higher cost due to streaming infrastructure and perceived value.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - Interactive chat experiences
   * - Real-time content generation
   * - Live coding assistance
   */
  chat_streaming: 20,

  /**
   * Chat with tool/function calling
   *
   * Chat message that can invoke external tools, APIs, or functions.
   * Includes extra tokens for tool definitions and result processing.
   *
   * **Estimated tokens**: 2000-6000
   *
   * **Example use cases**:
   * - Database queries via chat
   * - API integrations
   * - Calculator or data lookups
   */
  chat_with_tools: 30,

  /**
   * Image analysis in chat (Vision)
   *
   * Multimodal chat message with one or more images for visual analysis.
   * Higher cost due to image processing tokens (images consume ~85 tokens
   * per 512x512 tile in most providers).
   *
   * **Estimated tokens**: 2000-8000 (depends on image resolution)
   *
   * **Example use cases**:
   * - "What's in this image?"
   * - Screenshot analysis and debugging
   * - Document/receipt scanning via chat
   * - Design feedback and comparison
   */
  image_analysis: 30,

  /**
   * PDF document analysis in chat
   *
   * Upload and analyze PDF documents in the LLM Chat.
   * Server-side text extraction with pdf-parse, then AI analysis.
   * Higher cost than streaming due to extraction overhead.
   *
   * **Estimated tokens**: 3000-10000 (depends on document length)
   *
   * **Example use cases**:
   * - "Summarize this contract"
   * - "What are the key terms in this PDF?"
   * - "Extract the action items from this meeting notes PDF"
   */
  pdf_analysis: 40,

  // ============================================================================
  // Advanced AI Operations
  // ============================================================================

  /**
   * Image generation from text prompt
   *
   * Text-to-image generation using models like DALL-E or Stable Diffusion.
   * Highest single-operation cost due to computational requirements.
   *
   * **Estimated tokens**: N/A (GPU-based operation)
   *
   * **Example use cases**:
   * - Marketing visual generation
   * - Product mockups
   * - Concept art creation
   */
  image_gen: 80,

  /**
   * Image editing/manipulation
   *
   * Modify existing images using text prompts or masks.
   * Includes inpainting, outpainting, and style transfer.
   *
   * **Estimated tokens**: N/A (GPU-based operation)
   *
   * **Example use cases**:
   * - Background removal
   * - Object replacement
   * - Image enhancement
   */
  image_edit: 50,

  /**
   * Code analysis and review
   *
   * Static analysis, bug detection, and code quality assessment.
   * Analyzes code structure, patterns, and potential issues.
   *
   * **Estimated tokens**: 3000-8000
   *
   * **Example use cases**:
   * - Security vulnerability scanning
   * - Performance optimization suggestions
   * - Code smell detection
   */
  code_analysis: 40,

  /**
   * Code generation from specifications
   *
   * Generate complete code files or functions from natural language
   * descriptions. Includes language-specific syntax and best practices.
   *
   * **Estimated tokens**: 4000-10000
   *
   * **Example use cases**:
   * - Component scaffolding
   * - API endpoint generation
   * - Test case creation
   */
  code_gen: 50,

  // ============================================================================
  // Embeddings and Vector Operations
  // ============================================================================

  /**
   * Single text embedding generation
   *
   * Convert text to vector representation for semantic search.
   * Typically 1536-dimensional vector (OpenAI ada-002).
   *
   * **Estimated tokens**: 100-500
   *
   * **Example use cases**:
   * - Document indexing
   * - Semantic search preparation
   * - Content similarity calculation
   */
  embedding_single: 5,

  /**
   * Batch embedding generation
   *
   * Process multiple texts in a single batch operation.
   * More efficient than individual embeddings for bulk operations.
   *
   * **Estimated tokens**: 1000-5000
   *
   * **Example use cases**:
   * - Bulk document processing
   * - Knowledge base initialization
   * - Large-scale content indexing
   */
  embedding_batch: 10,

  /**
   * Vector similarity search
   *
   * Query vector database to find semantically similar content.
   * Cost covers embedding query text and database lookup.
   *
   * **Estimated tokens**: 200-800
   *
   * **Example use cases**:
   * - Semantic document search
   * - Recommendation systems
   * - Duplicate content detection
   */
  vector_search: 5,

  // ============================================================================
  // Audio Operations
  // ============================================================================

  /**
   * Audio transcription (speech-to-text)
   *
   * Convert audio files to text using Whisper or similar models.
   * Cost per minute of audio content.
   *
   * **Estimated tokens**: N/A (audio processing)
   *
   * **Example use cases**:
   * - Meeting transcription
   * - Podcast notes generation
   * - Voice command processing
   */
  transcription: 30,

  /**
   * Speech-to-text for chat voice input
   *
   * Short audio recordings from microphone input in LLM Chat,
   * transcribed via OpenAI Whisper. Lower cost than general transcription
   * because chat recordings are typically shorter (max 120s).
   *
   * **Estimated tokens**: N/A (audio processing)
   *
   * **Example use cases**:
   * - Voice input in chat (microphone button)
   * - Quick voice messages for AI conversation
   */
  speech_to_text: 20,

  /**
   * Text-to-speech synthesis
   *
   * Generate natural-sounding audio from text input.
   * Includes voice selection and audio quality options.
   *
   * **Estimated tokens**: N/A (audio synthesis)
   *
   * **Example use cases**:
   * - Voiceover generation
   * - Accessibility features
   * - Audio content creation
   */
  tts: 20,

  // ============================================================================
  // Document Processing
  // ============================================================================

  /**
   * PDF parsing and text extraction
   *
   * Extract text, tables, and metadata from PDF documents.
   * Handles multi-page documents with layout preservation.
   *
   * **Estimated tokens**: 1000-3000
   *
   * **Example use cases**:
   * - Document digitization
   * - Invoice processing
   * - Contract analysis
   */
  pdf_parse: 15,

  /**
   * Optical Character Recognition (OCR)
   *
   * Extract text from images and scanned documents.
   * Includes text detection, recognition, and layout analysis.
   *
   * **Estimated tokens**: N/A (image processing)
   *
   * **Example use cases**:
   * - Receipt scanning
   * - Handwriting recognition
   * - Screenshot text extraction
   */
  ocr: 30,

  /**
   * Document summarization
   *
   * Generate concise summaries of long documents.
   * Uses extractive or abstractive summarization techniques.
   *
   * **Estimated tokens**: 5000-12000
   *
   * **Example use cases**:
   * - Research paper summaries
   * - Meeting notes condensation
   * - Article key points extraction
   */
  document_summary: 65,

  // ============================================================================
  // Content Generation
  // ============================================================================

  /**
   * Template-based content generation
   *
   * Generate text from templates (email, product description, blog outline,
   * social media post, marketing copy) with streaming output.
   * Cost covers template processing + text generation.
   *
   * **Estimated tokens**: 1000-4000
   *
   * **Example use cases**:
   * - Professional email drafting
   * - Product description writing
   * - Blog post outline generation
   * - Social media post creation
   * - Marketing copy generation
   */
  content_generation: 25,

Vollständige Kostentabelle

VorgangCreditsGeschätzte TokensKategorie
faq_simple5500–1.000FAQ
faq_complex152.000–4.000FAQ
chat_message151.000–4.000Chat
chat_streaming201.000–4.000Chat
content_generation251.000–4.000Content
chat_with_tools302.000–6.000Chat
image_analysis302.000–8.000Chat
pdf_analysis403.000–10.000Chat
image_gen80N/A (GPU)Erweitertes KI
image_edit50N/A (GPU)Erweitertes KI
code_analysis403.000–8.000Erweitertes KI
code_gen504.000–10.000Erweitertes KI
embedding_single5100–500Embeddings
embedding_batch101.000–5.000Embeddings
vector_search5200–800Embeddings
transcription30N/A (Audio)Audio
tts20N/A (Audio)Audio
speech_to_text20N/A (Audio)Audio
pdf_parse151.000–3.000Dokument
ocr30N/A (Bild)Dokument
document_summary655.000–12.000Dokument

Credit-Kosten-Hilfsfunktionen

Das Credit-Kosten-Modul bietet Hilfsfunktionen:
FunktionZweck
getCreditCost(operation)Kosten für einen einzelnen Vorgang abrufen
calculateBatchCost(operation, quantity)Gesamtkosten für Batch-Vorgänge berechnen
getAllCreditCosts()Alle Kosten als einfaches Objekt abrufen (für Admin-UI)
isValidOperation(string)Type-Guard — prüfen, ob ein String ein gültiger Vorgang ist
getOperationsByCategory()Vorgänge nach Kategorie gruppieren (faq, chat, etc.)
estimateOperationCount(operation, credits)Wie viele Vorgänge sind mit X Credits möglich?
formatCreditAmount(credits, includeUnit)Für die Anzeige formatieren (20"20 credits")

Usage-Tracking

Jede KI-Anfrage wird in der AIUsage-Datenbanktabelle für Analysen und Kostenüberwachung erfasst. Das Tracking ist nicht-blockierend — Fehler werden protokolliert, verhindern aber nie die Auslieferung der KI-Antwort.

TrackUsageParams

typescript
interface TrackUsageParams {
  userId?: string
  sessionId?: string
  provider: string        // "openai", "anthropic", etc.
  model: string           // "gpt-5-nano", "claude-haiku", etc.
  tokens: number          // Total tokens used
  cost?: TokenCost | number  // USD cost (from provider pricing)
  purpose: 'faq' | 'chat' | 'completion' | 'stream' | 'embedding' | 'general'
  metadata?: Record<string, unknown>  // Additional context
}

Monatliche Aggregation

Die Nutzung wird monatlich für Kontingentprüfungen und Analysen aggregiert:
typescript
interface MonthlyUsage {
  totalTokens: number
  totalCost: number
  requestCount: number
  byProvider: Record<string, { tokens, cost, requests }>
  byPurpose: Record<string, { tokens, cost, requests }>
}

Usage-API

EndpunktMethodeZweck
/api/ai/usageGETNutzungsstatistiken des aktuellen Monats
Der Usage-Endpunkt gibt aggregierte Daten aufgeschlüsselt nach Provider und Zweck zurück, geeignet für Dashboard-Diagramme und Nutzungsmesser.

Token-Kostenberechnung

Kit berechnet die USD-Kosten jeder Anfrage anhand von modellspezifischen Preistabellen. Die calculateCost-Methode auf BaseProvider rechnet Token-Anzahlen in Dollar-Beträge um:
src/lib/ai/providers/base-provider.ts — Cost Calculation
calculateCost(usage: TokenUsage, model?: string): TokenCost {
    const modelInfo = this.getModelInfo(model ?? this.defaultModel)
    if (!modelInfo) {
      return {
        promptCost: 0,
        completionCost: 0,
        totalCost: 0,
        currency: 'USD',
      }
    }

    const promptCost =
      (usage.promptTokens / 1_000_000) * modelInfo.costPerMillionPromptTokens
    const completionCost =
      (usage.completionTokens / 1_000_000) *
      modelInfo.costPerMillionCompletionTokens

    return {
      promptCost,
      completionCost,
      totalCost: promptCost + completionCost,
      currency: 'USD',
    }
  }
Die Berechnung verwendet costPerMillionPromptTokens und costPerMillionCompletionTokens aus der Modellinfo-Registry. Das ermöglicht genaues Kosten-Tracking über alle vier Provider hinweg.
Beispiel: Eine Anfrage mit claude-haiku-4-5 mit 500 Prompt-Tokens und 200 Completion-Tokens:
Prompt-Kosten:     (500 / 1.000.000) × $0,80 = $0,000400
Completion-Kosten: (200 / 1.000.000) × $4,00 = $0,000800
Gesamtkosten:      $0,001200

Feature-Flag-Integration

Das Kostenmanagement-System verhält sich unterschiedlich, je nachdem, ob das Credit-System aktiviert ist:
VerhaltenCredit-System ANCredit-System AUS
Rate-LimitingGlobaler Burst + Credit-Guthaben-PrüfungNur globaler Burst
Credit-AbzugAtomarer Abzug vor der VerarbeitungÜbersprungen
Usage-TrackingVollständiges Tracking mit KostenVollständiges Tracking (nur Analysen)
Monatliches LimitBasiert auf Credit-GuthabenUnbegrenzt (999999)
402-Fehler„Insufficient credits"Werden nie gesendet
Auto-ResetPrüft, ob 30+ Tage vergangen sindÜbersprungen
Die Feature-Flag-Prüfung findet innerhalb von checkRateLimit statt:
checkRateLimit()
    |
    |--- IMMER: Globales Rate-Limit prüfen
    |
    |--- isCreditSystemEnabled()?
    |    |
    |    |--- JA:  Credit-Guthaben prüfen, Auto-Reset, abziehen
    |    |--- NEIN: Anfrage zulassen (protokolliere "Credit system disabled")

Umgebungsvariablen

VariableStandardZweck
AI_RATE_LIMIT_WINDOW10Globales Rate-Limit-Fenster in Sekunden
AI_RATE_LIMIT_MAX_REQUESTS10Maximale Anfragen pro globalem Fenster
AI_FREE_TIER_REQUESTS500Monatliches Limit für den Free-Tier
AI_BASIC_TIER_REQUESTS1500Monatliches Limit für den Basic-Tier
AI_PRO_TIER_REQUESTS5000Monatliches Limit für den Pro-Tier
AI_ENTERPRISE_TIER_REQUESTS15000Monatliches Limit für den Enterprise-Tier
UPSTASH_REDIS_REST_URLRedis-URL für Rate-Limiting
UPSTASH_REDIS_REST_TOKENRedis-Token für Rate-Limiting
NEXT_PUBLIC_PRICING_MODELcredit_basedcredit_based oder classic_saas

Wichtige Dateien

DateiZweck
apps/boilerplate/src/lib/ai/rate-limiter.tsGlobales Burst- und Tier-basiertes Rate-Limiting
apps/boilerplate/src/lib/credits/credit-costs.tsCredit-Kosten pro Vorgang (21 Vorgänge)
apps/boilerplate/src/lib/credits/credit-manager.tsAtomare Credit-Abzüge mit SELECT FOR UPDATE
apps/boilerplate/src/lib/credits/config.tsCredit-System-Feature-Flag (isCreditSystemEnabled())
apps/boilerplate/src/lib/ai/usage-tracker.tsNicht-blockierendes Usage-Tracking in die Datenbank
apps/boilerplate/src/lib/ai/providers/base-provider.tsToken-Kostenberechnung pro Provider
apps/boilerplate/src/app/api/ai/usage/route.tsNutzungsstatistiken-Endpunkt