Priority levels
Every request is billed per token. Choose the priority level that fits your workload.| Priority | Price / 1M tokens | Latency | Best for |
|---|---|---|---|
| Fast | $0.60 | <200ms P95 | Real-time UX, chatbots |
| Standard | $0.20 | <1s P95 | Background processing |
| Async Batch | $0.08 | <15min P95 | Bulk analysis, exports |
priority parameter on /v1/classify and /v1/tag. Async Batch pricing applies automatically when using the async batch endpoints.
Generative LLM models
/v1/chat/completions and /v1/responses are billed per-model, per-direction
(input vs output tokens):
| Model | Input ($/1M) | Cache read ($/1M) | Output ($/1M) |
|---|---|---|---|
deepseek-ai/DeepSeek-V4-Flash | $0.10 | – | $0.28 |
Qwen/Qwen3.6-35B-A3B | $0.15 | $0.05 | $1.00 |
moonshotai/Kimi-K2.6 | $0.70 | – | $4.00 |
usage field, so the bill always matches
what the model actually generated. Both streaming and non-streaming calls are
supported. See Generative LLM for endpoint details.
Free tier
Every account gets 10M free tokens/month — no credit card required. Free tokens work on all priority levels.Rate limits
Rate limits depend on the tier configured for your API key.| Tier | Requests/min | Tokens/min | Concurrent |
|---|---|---|---|
| Public | 30 | 150,000 | 2 |
| Developer | 60 | 500,000 | 5 |
| Production | 1,000 | 10,000,000 | 50 |
| Enterprise | Custom | Custom | Custom |
Validation limits
| Limit | Value |
|---|---|
| Max labels per request | 200 |
| Max label length | 50 characters |
| Max text length | 100,000 characters |
| Max request body | 1MB |
| Max sync batch size | 1,000 texts |
| Max async batch file | 5GB |
Cache
Responses are cached automatically based on your tier. Cache hits are free (zero tokens billed).| Tier | Default TTL |
|---|---|
| Public | 5 minutes |
| Developer | 1 hour |
| Production | 24 hours |
cache: false in your request to bypass caching.