Not Diamond leverages your evaluation data to learn a mapping from inputs to model rankings. It predictively determines which LLM is best-suited for each input in your application, flexibly adapting to your domain, your definition of quality, and your business logic.
Not Diamond has built a state of the art prompt adaptation technique that can take a prompt written for one model and automatically adapt it to any other target model.
Not Diamond is most useful for teams that have begun scaling beyond one or two AI applications and have begun building five, ten or dozens of AI pipelines across many models.
You can think of Not Diamond as a “meta-model”, a data-driven ensemble of all the most powerful LLMs, which beats each individual model on quality while drastically reducing costs and latency.
Not Diamond is not a proxy. It simply recommends which model to use and then all requests to LLMs are made client-side. You can call models through APIs, gateways, or locally—Not Diamond is agnostic to your request orchestration pipelines.
Not Diamond’s inference speed is under 50ms.. By routing to faster LLMs when possible you can drive net speedups in your LLM calls. To avoid network latency and maximize speed you can deploy Not Diamond directly to your infrastructure.
Not Diamond is available through our Python SDK, TypeScript client, and our REST API, so you can leverage model routing within any stack.
Not Diamond is SOC-2 compliant and we support client-side request execution, zero data retention, and VPC deployments.