Pricing & Limits

Priority levels

Every request is billed per token. Choose the priority level that fits your workload.

Priority	Price / 1M tokens	Latency	Best for
Fast	$0.60	<200ms P95	Real-time UX, chatbots
Standard	$0.20	<1s P95	Background processing
Async Batch	$0.08	<15min P95	Bulk analysis, exports

Fast and Standard are set via the priority parameter on /v1/classify and /v1/tag. Async Batch pricing applies automatically when using the async batch endpoints.

Generative LLM models

/v1/chat/completions and /v1/responses are billed per-model, per-direction (input vs output tokens):

Model	Input ($/1M)	Cache read ($/1M)	Output ($/1M)
`deepseek-ai/DeepSeek-V4-Flash`	$0.10	–	$0.28
`Qwen/Qwen3.6-35B-A3B`	$0.15	$0.05	$1.00
`moonshotai/Kimi-K2.6`	$0.70	–	$4.00

Token counts come from the upstream usage field, so the bill always matches what the model actually generated. Both streaming and non-streaming calls are supported. See Generative LLM for endpoint details.

Free tier

Every account gets 10M free tokens/month — no credit card required. Free tokens work on all priority levels.

Rate limits

Rate limits depend on the tier configured for your API key.

Tier	Requests/min	Tokens/min	Concurrent
Public	30	150,000	2
Developer	60	500,000	5
Production	1,000	10,000,000	50
Enterprise	Custom	Custom	Custom

Cache hits do not consume rate limits.

Validation limits

Limit	Value
Max labels per request	200
Max label length	50 characters
Max text length	100,000 characters
Max request body	1MB
Max sync batch size	1,000 texts
Max async batch file	5GB

Cache

Responses are cached automatically based on your tier. Cache hits are free (zero tokens billed).

Tier	Default TTL
Public	5 minutes
Developer	1 hour
Production	24 hours

Set cache: false in your request to bypass caching.

Enterprise

Need volume pricing, dedicated infrastructure, or custom rate limits? Contact us.

​Priority levels

​Generative LLM models

​Free tier

​Rate limits

​Validation limits

​Cache

​Enterprise