Build without limits

Designed for teams at every scale
Discovery
Free
Up to 100K monthly API routing requests
Train one custom router
Intelligent cost and latency tradeoffs
Joint prompt optimization support
Fallback rerouting
Prompt adaptation (coming soon)
Get started
Possibility
$100/mo
Plus $0.001 per API routing request after the first 100K free
Everything in Discovery
Uncapped API routing requests
Unlimited custom routers
Enhanced data privacy with fuzzy hashing
Get started
Necessity
Custom pricing
Contact us for individual pricing
Everything in Possibility
VPC deployments
Custom integration and router training support
Access and permissions management
Schedule a call
We also regularly open source new releases of our base router.

Frequently asked questions

How does Not Diamond recommend models?

Not Diamond leverages your evaluation data to learn a mapping from inputs to model rankings. It predictively determines which LLM is best-suited for each input in your application, flexibly adapting to your domain, your definition of quality, and your business logic.

How does Not Diamond adapt prompts across different models?

Not Diamond has built a state of the art prompt adaptation technique that can take a prompt written for one model and automatically adapt it to any other target model.

When is the right time to try out Not Diamond?

Not Diamond is most useful for teams that have begun scaling beyond one or two AI applications and have begun building five, ten or dozens of AI pipelines across many models.

How is this different from using a single model?

You can think of Not Diamond as a “meta-model”, a data-driven ensemble of all the most powerful LLMs, which beats each individual model on quality while drastically reducing costs and latency.

Is Not Diamond a proxy?

Not Diamond is not a proxy. It simply recommends which model to use and then all requests to LLMs are made client-side. You can call models through APIs, gateways, or locally—Not Diamond is agnostic to your request orchestration pipelines.

Will Not Diamond add extra latency to my requests?

Not Diamond’s inference speed is under 50ms.. By routing to faster LLMs when possible you can drive net speedups in your LLM calls. To avoid network latency and maximize speed you can deploy Not Diamond directly to your infrastructure.

What languages does Not Diamond support?

Not Diamond is available through our Python SDK, TypeScript client, and our REST API, so you can leverage model routing within any stack.

What is Not Diamond's security posture?

Not Diamond is SOC-2 compliant and we support client-side request execution, zero data retention, and VPC deployments.