Build without limits

Designed for teams at every scale

Discovery

Free

Up to 100K monthly API routing requests

Train one custom router

Intelligent cost and latency tradeoffs

Joint prompt optimization support

Fallback rerouting

Prompt adaptation (coming soon)

Get started

Possibility

$100/mo

Plus $0.001 per API routing request after the first 100K free

Everything in Discovery

Uncapped API routing requests

Unlimited custom routers

Enhanced data privacy with fuzzy hashing

Get started

Necessity

Custom pricing

Everything in Possibility

VPC deployments

Custom integration and router training support

Access and permissions management

Schedule a call

We also regularly open source new releases of our base router.

Frequently asked questions

How does Not Diamond recommend models?

Not Diamond leverages your evaluation data to learn a mapping from inputs to model rankings. It predictively determines which LLM is best-suited for each input in your application, flexibly adapting to your domain, your definition of quality, and your business logic.

How does Not Diamond adapt prompts across different models?

Not Diamond has built a state of the art prompt adaptation technique that can take a prompt written for one model and automatically adapt it to any other target model.

When is the right time to try out Not Diamond?

Not Diamond is most useful for teams that have begun scaling beyond one or two AI applications and have begun building five, ten or dozens of AI pipelines across many models.

How is this different from using a single model?

You can think of Not Diamond as a “meta-model”, a data-driven ensemble of all the most powerful LLMs, which beats each individual model on quality while drastically reducing costs and latency.

Is Not Diamond a proxy?

Not Diamond is not a proxy. It simply recommends which model to use and then all requests to LLMs are made client-side. You can call models through APIs, gateways, or locally—Not Diamond is agnostic to your request orchestration pipelines.

Will Not Diamond add extra latency to my requests?

Not Diamond’s inference speed is under 50ms.. By routing to faster LLMs when possible you can drive net speedups in your LLM calls. To avoid network latency and maximize speed you can deploy Not Diamond directly to your infrastructure.

What languages does Not Diamond support?

Not Diamond is available through our Python SDK, TypeScript client, and our REST API, so you can leverage model routing within any stack.

What is Not Diamond's security posture?

Not Diamond is SOC-2 compliant and we support client-side request execution, zero data retention, and VPC deployments.

Build without limits

Frequently asked questions

10x your AI development cycles