Build without limits

100K monthly free routing requests through our API
Discovery
Free
Up to 100K monthly API routing requests
Train one custom router
Intelligent cost and latency tradeoffs
Joint prompt optimization support
Fallback rerouting
Get started
Possibility
$100/mo
Plus $0.001 per API routing request after the first 100K free
Everything in Discovery
Uncapped API routing requests
Unlimited custom routers
Enhanced data privacy with fuzzy hashing
Get started
Necessity
Custom pricing
Contact us for individual pricing
Everything in Possibility
VPC deployments
Custom integration and router training support
Access and permissions management
Schedule a call
Our chat app can also be used for free, or you can upgrade to pro for $20/month. We also regularly open source new releases of our base router.

Frequently asked questions

Is Not Diamond a proxy?

Not Diamond is not a proxy. It simply recommends which model to use and then all requests to LLMs are made client-side. You can call models through APIs, gateways, or locally—Not Diamond is agnostic to your request orchestration pipelines.

How does Not Diamond determine which model to call?

Not Diamond is a highly specialized predictive model optimized for model routing. Trained on a large, cross-domain evaluation dataset, it accurately predicts which LLM will perform best for any input.

Does Not Diamond integrate with my data?

Not Diamond is designed to work seamlessly with your existing data and evaluation pipelines. You can upload any LLM evaluation dataset and within minutes you’ll get back a router optimized to your use case.

When is the right time to try out Not Diamond?

Not Diamond is designed for every stage of the development process. Our users include developers building on  our API from day one all the way up to sophisticated enterprise teams routing every request in production.

Can I optimize prompts across different models?

Not Diamond makes it easy to leverage automatic prompt optimization frameworks like DSPy and SAMMO, or to use your own manually developed prompts for each LLM. Not Diamond will learn the best model and prompt combination for each query.

How is this different from simply using a single model?

You can think of Not Diamond as a “meta-model”, an ensemble of all the most powerful LLMs, which beats each individual model on quality while drastically reducing costs and latency.

Will Not Diamond add extra latency to my model calls?

Not Diamond’s inference speed is under 100ms, and by routing to faster LLMs when possible you can drive net speedups in your LLM calls. To avoid network latency and maximize speed you can deploy Not Diamond directly to your infrastructure.

Does Not Diamond support RAG and agent use cases?

Yes, Not Diamond is especially powerful for RAG and agent workflows. As highly diverse and unseen prompts propagate through the workflow, Not Diamond's routing improves quality, reliability, speed, and efficiency.

What languages does Not Diamond support?

Not Diamond is available through our Python SDK, TypeScript client, and our REST API, so you can leverage model routing within any stack.

Is Not Diamond SOC-2 Compliant?

Not Diamond is currently in the process of securing SOC-2 compliance and will be fully compliant in 2024.