Skip to main content
Vincony
IN
Inception
Text

Mercury

inception/mercury

1 credit / request
Compare with…Added 2026

Mercury is Inception's high-speed inference model, engineered for scenarios where latency is the primary constraint. It delivers impressively fast response times while maintaining solid generation quality, making it ideal for real-time applications where users expect instant responses.

Mercury's architecture is optimized from the ground up for throughput and time-to-first-token, making it particularly effective for interactive search, autocomplete, and live conversation applications.

Key Features

Industry-leading inference speed

Sub-100ms time to first token

Solid generation quality at extreme speed

Optimized for real-time interactive use

High throughput for cost-efficient batch processing

Ideal Use Cases

1.

Real-time conversational AI requiring instant responses

2.

Autocomplete and inline suggestion systems

3.

Interactive search with instant results

4.

Latency-critical production pipelines

Technical Specifications

Context Window128K tokens
ModalityText → Text
ProviderInception
CategoryText Generation
LatencySub-100ms TTFT
Optimized ForMaximum inference speed

API Usage

1curl -X POST https://api.vincony.com/v1/chat/completions \
2 -H "Authorization: Bearer YOUR_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "model": "inception/mercury",
6 "messages": [
7 { "role": "user", "content": "Hello, Mercury!" }
8 ]
9 }'

Replace YOUR_API_KEY with your Vincony API key. OpenAI-compatible endpoint — works with any OpenAI SDK.

Compare with Another Model

Or compare up to 3 models

Frequently Asked Questions

Try Mercury now

Start using Mercury instantly — 100 free credits, no credit card required. Access 343+ AI models through one platform.

More from Inception

Use to navigate between models · Esc to go back

Vincony — Access the World's Best AI Models