Mercury is Inception's high-speed inference model, engineered for scenarios where latency is the primary constraint. It delivers impressively fast response times while maintaining solid generation quality, making it ideal for real-time applications where users expect instant responses.
Mercury's architecture is optimized from the ground up for throughput and time-to-first-token, making it particularly effective for interactive search, autocomplete, and live conversation applications.
Key Features
Industry-leading inference speed
Sub-100ms time to first token
Solid generation quality at extreme speed
Optimized for real-time interactive use
High throughput for cost-efficient batch processing
Ideal Use Cases
Real-time conversational AI requiring instant responses
Autocomplete and inline suggestion systems
Interactive search with instant results
Latency-critical production pipelines
Technical Specifications
| Context Window | 128K tokens |
| Modality | Text → Text |
| Provider | Inception |
| Category | Text Generation |
| Latency | Sub-100ms TTFT |
| Optimized For | Maximum inference speed |
API Usage
1 curl -X POST https://api.vincony.com/v1/chat/completions \ 2 -H "Authorization: Bearer YOUR_API_KEY" \ 3 -H "Content-Type: application/json" \ 4 -d '{ 5 "model": "inception/mercury", 6 "messages": [ 7 { "role": "user", "content": "Hello, Mercury!" } 8 ] 9 }'
Replace YOUR_API_KEY with your Vincony API key. OpenAI-compatible endpoint — works with any OpenAI SDK.
Compare with Another Model
Frequently Asked Questions
Try Mercury now
Start using Mercury instantly — 100 free credits, no credit card required. Access 343+ AI models through one platform.
More from Inception
Use ← → to navigate between models · Esc to go back