GLM-4.5 Air is ZAI's most lightweight and cost-efficient model, optimized for scenarios where speed and volume trump maximum quality. It delivers quick bilingual responses for simple tasks — classification, short-form Q&A, entity extraction, and basic content formatting — at the lowest per-query cost in ZAI's lineup.
Air is the natural choice for high-throughput processing pipelines, real-time features that need sub-second responses, and development/testing environments where fast iteration matters more than production quality.
Key Features
Ultra-fast inference for sub-second response times
Lowest per-query cost in ZAI's model family
Optimized for high-throughput batch processing
Bilingual support for Chinese and English tasks
128K token context for flexible input sizes
Ideal for development, testing, and rapid prototyping
Ideal Use Cases
High-volume classification and entity extraction
Real-time bilingual chatbots prioritizing speed
Development and testing with fast iteration cycles
Cost-efficient batch processing of simple text tasks
Technical Specifications
| Context Window | 128K tokens |
| Modality | Text → Text |
| Provider | ZAI |
| Category | Text Generation |
| Latency | Low |
| Best For | Quick, cost-efficient tasks |
API Usage
1 curl -X POST https://api.vincony.com/v1/chat/completions \ 2 -H "Authorization: Bearer YOUR_API_KEY" \ 3 -H "Content-Type: application/json" \ 4 -d '{ 5 "model": "zai/glm-4.5-air", 6 "messages": [ 7 { "role": "user", "content": "Hello, GLM-4.5 Air!" } 8 ] 9 }'
Replace YOUR_API_KEY with your Vincony API key. OpenAI-compatible endpoint — works with any OpenAI SDK.
Compare with Another Model
Frequently Asked Questions
Try GLM-4.5 Air now
Start using GLM-4.5 Air instantly — 100 free credits, no credit card required. Access 343+ AI models through one platform.
More from ZAI
Use ← → to navigate between models · Esc to go back