Vision capabilities
Users can upload images directly into the chat and receive analysis from multimodal models. The image is sent as a base64-encoded payload alongside the text prompt, letting the model reason about visual content — diagrams, screenshots, charts, handwritten notes. No separate vision API or pre-processing pipeline required.
PDF document chat
PDF chat follows a three-step pipeline: upload, parse, and converse. Documents are parsed server-side using pdf-parse, then split into chunks sized for the model's context window. The chunked content is injected as context for subsequent chat messages, allowing users to ask questions about specific sections without re-uploading.
Credits are deducted after pre-processing (parsing and chunking) but before streaming begins. This ensures users are only charged for successfully processed documents while preventing abuse through repeated upload attempts.
Turbopack compatibility
The pdf-parse library depends on pdfjs-dist, which uses dynamic worker imports that break Turbopack's static analysis. A createRequire loader module pattern wraps the native require() call in a separate file that Turbopack ignores, avoiding runtime failures without falling back to Webpack.