The future is multi-model.
notdiamond-0001 is our first model router available on Hugging Face. notdiamond-0001 takes any input and determines whether to send it to GPT-3.5 or GPT-4, optimizing for the highest accuracy while drastically reducing your costs and latency.
We’ve spent the last month working to make sure notdiamond-0001 meets the following five criteria:
To get started with notdiamond-0001, read our documentation and download it on Hugging Face.
Our router has been trained on 250,000 data points from robust, cross-domain evaluation benchmarks to optimize for measurable quality metrics like accuracy, ROUGE, BLEU, etc. These benchmarks cover everything from code generation to text summarization, medicine, and law.
Unlike deterministic routers, we don't route based on simple categories or domains. Instead, routing decisions are far more fine-grained. Here are some examples of prompts that get routed to either GPT-3.5 or GPT-4:
Sends to GPT-3.5
Sends to GPT-4
What is prostate cancer?
Can you help me understand whether this prostate pathology report could indicate the presence of cancer? CLINICAL DATA: A-H: ELEVATED PROSTATE, PROSTATE [A] LEFT BASE… [continues]
What are common causes of a 401 error?
This function is throwing a 401 error when it's being called. Do you see anything that could be contributing to that? def get_user_data(request, response, db: Session = Depends(get_db))… [continues]
What does this paragraph mean?: “As your perspective of the world increases not only is the pain it inflicts on you less but also its meaning… [continues]
Please help me complete this paragraph with a pared down description in the style of Karl Ove Knausgård: The broken cup lay on the table.
We found no skew towards either model across our classification dataset, meaning GPT-3.5 is capable of handling queries roughly 50% of the time. We route to GPT-4 in the other 50% of cases. This ratio may vary for your application depending on the distribution of inputs.
We don't keep track of your model routing destinations. Since all calls go out client-side, we encourage you to log calls locally.
Our router is extremely fast and will determine which model to call in under 10ms. By routing appropriate queries to GPT-3.5 rather than GPT-4, you'll also see significant net speedups compared to sending everything to GPT-4.
We are working to onboard more models as quickly as possible. If you have a model you'd like to request, please email us!
We believe in a multi-model future. The world won't have one single, giant model that everyone sends everything to—instead, there will be many foundation models, millions of fine-tuned variants of those models, and countless custom inference engines running on top of them. We believe this is not only a better future for AI, but a safer one as well. We started Not Diamond to enable this multi-model future, starting with safe and robust infrastructure for routing between models.
Why routing? Over the past months, we’ve talked to hundreds of developers and companies building on top of LLMs, from early-stage startups to Fortune 500 companies. For nearly everyone, model routing is a big, hairy, audacious problem. It sucks. Teams are using heuristics to route deterministically with if/else statements and regex expressions, trying to train their own classifiers to route inputs, A/B testing model selections, or handwriting prompts in an attempt to use slow and bulky agents as routers. When teams do manage to get a router working, it frequently breaks whenever the underlying models update. Meanwhile, those who have managed to build functional routers have seen huge gains in their product quality and margins. We decided there had to be a better way.
If you’re using GPT-4, notdiamond-0001 will lead to an immediate and drastic reduction in your inference costs and latency without any degradation in quality. Or, if you’re using GPT-3.5, you can enjoy a much higher response quality without significantly increasing your bill. And in either case, our router will protect you from expensive outages—we continuously monitor OpenAI's models 24/7 so that you never need to experience a gap in service.
But notdiamond-0001 is just the first step in a much more comprehensive product roadmap. Over the coming months, we’ll be releasing a lot more, including the ability to dynamically route to Claude, Llama, Mistral, and many more public models, as well as your own fine-tuned models and custom workflows, agents, RAG applications, and chains.
As a team, we've built venture scale companies, developed products for billions of users, and published cutting edge research in top AI journals. We’re excited to be backed by some of the world's best developers, including the founders of companies like Hugging Face, Replicated, Giphy, Indeed, and many more.
We’re actively hiring, so drop us a line if you want to help us build a multi-model future.
1. This value refers to notdiamond-0001's inference speed.