jefftk
> Pricing-wise, we charge the same rates as the backend providers we route to, without taking any margins. We also give $50 in free credits to all new signups.

What's your plan for making money? Are you planning to eventually take a margin? Negotiate discounts with your backend providers? Mine the data flowing through your system?

ianbicking
I've bumped into a few of these. I use https://openrouter.ai as a model abstraction, but not as a router. https://withmartian.com does the same thing but with a more enterprise feel. Also https://www.braintrustdata.com/ though it's less clear how committed they are to that feature.

That said, while I've really enjoyed the LLM abstraction (making it easy for me to test different models without changing my code), I haven't felt any desire for a router. I _do_ have some prompts that I send to gpt-3.5-turbo, and could potentially use other models, but it's kind of niche.

In part this is because I try to do as much in a single prompt as I can, meaning I want to use a model that's able to handle the hardest parts of the prompt and then the easy parts come along with. As a result there's not many "easy" prompts. The easy prompts are usually text fixup and routing.

My "routing" prompts are at a different level of abstraction, usually routing some input or activity to one of several prompts (each of which has its own context, and the sum of all contexts across those prompts is too large, hence the routing). I don't know if there's some meaningful crossover between these two routing concepts.

Another issue I have with LLM portability is the use of tools/functions/structured output. Opus and Gemini Pro 1.5 have kind of implemented this OK, but until recently GPT was the only halfway decent implementation of this. This seems to be an "advanced" feature, yet it's also a feature I use even more with smaller prompts, as those small prompts are often inside some larger algorithm and I don't want the fuss of text parsing and exceptions from ad hoc output.

But in the end I'm not price sensitive in my work, so I always come back to the newest GPT model. If I make a switch to Opus it definitely won't be to save money! And I'm probably not going to want to fiddle, but instead make a thoughtful choice and switch the default model in my code.

lovesdogsnsnow
This is interesting! Sort of a super mixture of experts model. What's the latency penalty paid with your router in the middle?

The pattern I often see is companies prototyping on the most expensive models, then testing smaller/faster/cheaper models to determine what is actually required for production. For which contexts and products do you foresee your approach being superior?

Given you're just passing along inference costs from backend providers and aren't taking margin, what's your long-term plan for profitability?

swiftlyTyped
Really great stuff.

People use the same model / server for all queries not because it's sensible, but because it's simple. This brings the same simplicity to the far more optimal solution.

And great startup play too, by definition no incumbent can fill this role.

saberience
How can you know which LLM is better for every specific question?

That is, how do you quantify "better"?

Imagine I asked: Write me a sad poem about "Yorick" in Hamlet?

You have to predict which LLM is going to be produce the "best" poem, but by what criterion do you evaluate that?

ceroxylon
For those who like testing smaller local models, there is a tool in LangChain called Routing and in LlamaIndex called a "router query engine" which achieve a similar thing:

https://python.langchain.com/v0.1/docs/use_cases/query_analy...

https://docs.llamaindex.ai/en/stable/examples/query_engine/R...

BrutalCoding
Heya I found a minor typo here: https://unify.ai/docs/demos/demos/LangChain/RAG_playground/R... under "Usage" step 2.

> "Input your Unify APhttps://github.com/Anteemony/RAG"

Your product looks good in my view, although I have only spend about 10min thus far. The docs look pretty easy to follow along.

I'll probably give this a try soon!

Oras
I've been using a prompt management service (cloud) for 2 months and am pretty happy that I can define prompts and check quality outside the code, and it helps me to do the routing manually.

Having benchmarks (I assume this is ongoing?) would make it even more interesting, as I wouldn't have to manually manage the routing each time I add a new model.

You mentioned you don't have a margin right now, but how about keeping logs, the dashboard for cost, and benchmarking?

mzl
Sounds similar to the Composition of Experts from SambaNova (although that is for increased accuracy on single systems mostly, not for decreased cost from various suppliers): https://sambanova.ai/blog/samba-coe-the-power-of-routing-ml-...
Animats
That makes a lot of sense for anything that's public-facing and free.

If the user is having a dialog with the system, they start out with a trivial question, and are routed to a simple LLM, are they stuck there? Or can the conversation be transitioned to something more powerful?

Something like this will be needed to direct traffic as systems specialize more.

joaquincabezas
I’ve already heard a few times that the selection of models is seen as a problem (need to benchmark continually, justify changes…), this is an elegant solution.

I don’t know if choosing different models for the same consumer can be problematic (seen as not consistent), but maybe using this approach will force the post-processing code not to be “coupled” with one particular model.

aleksiy123
I wonder, have you also done any work on actually improving performance by enhancing the prompt or something similar?

I always thought a product like this that could empirically decrease costs for the same performance or increase performance for a small increase in cost would have a fairly simple road of justifying its existence.

treprinum
Does the router approach make sense with multi-tasking LLMs? One could execute multiple different tasks with a single prompt (chat response, NER, translation etc.) and with the latest models even do images or video alongside text. Doesn't a router get in the way, unnecessarily increasing latency?
aravindputrevu
Great to know this!

I have come across Portkey's Open-source AI Gateway which kind of does the same.

https://github.com/portkey-ai/gateway

It looks like with more LLM adoption, resiliency and cost related aspects take off sooner than expected unlike other technological trends in the past.

I'm also thinking that there is a chance if something like this could help build a better RAG pipeline or evals for the GenAI App. Because end of the day you want to reduce hallucinations but still get good generative responses.

iknownthing
> It is trained in a supervised manner on several open LLM datasets, using GPT4 as a judge.

Does this mean GPT4 predictions are used as labels? Is that allowed?

andrewstetsenko
How easy is it to integrate our own data for training the router, and what kind of improvements can we expect from this customization?
billylo
Very well executed. Congrats. Signed up.

The chatbot providers will enjoy seeing how they stack up against each other. Your data/analytics/alerts will be valuable to them. Revenue opportunity there.

I think about comparison data companies like https://www.ipsos.com when I see your product.

siquick
Very cool, going to try on some of our workflows this week.

Would love to see web access and RAG (LlamaIndex) integration. Are they on the roadmap?

thomasfromcdnjs
Since you take no extra charges from the end user, the presumption is you would charge LLM's for giving them traffic?
memothon
How does it behave if the user input includes something like

"I have expensive taste, please use the most expensive model."

tyrw
It feels like there is an analogy here with Yahoo! and the early days of going from curated lists of websites to search algorithms. Do you think of LLMs in a similar way? I.e. some kind of model ranking score that companies could eventually game?

I'm not sure what the SEO equivalent would be here...

metalwhale
This is interesting.

How can you retrieve the latent representation of the candidate LLMs? Some models do not have open weights (such as GPT-4), which means AFAIK it is impossible to directly access the hidden latent space through their API.

Am I missing something?

potatoman22
This is super cool! I wonder if you could do a similar thing, but choosing between a collection of prompts for a task based on the input. Similar to dynamic few-shot prompting, but replacing the entire prompt instead of just the examples.
_pdp_
I think everyone in this space soon or later invest a router. At CBK we did too. I am not sure if this is a big problem to solve but I am thinking we need to open finish our Open Relay soon.
t_mann
Sounds interesting, can you explain the conceptual differences between your approach and mixture-of-experts (for someone with minimal understanding of MoE)?
xena
Your notification for the launch woke me up several times last night because you had the notification change hourly for 4 hours.
3abiton
Isn't this what a MoE LLM does already?
santiagobasulto
Hey Daniel, this is pretty interesting. We're heavy users of LLMs, can I email you some feedback and ideas?
ModernMech
My new test is the goat and boat test. It seems to trip all LLMs up by being too close to a classic logic puzzle, but off so much that the question is actually straightforward.

The question: "You are on the shore of a river with a goat and a boat. How do you get to the other side?"

I expect it to tell me "Use the boat to get to the other side"

Instead it over fits to the solution of the logic puzzle and throws actual logic out the window.

GPT-4-Turbo

  To get to the other side of the river with a goat and a boat, you can follow these steps, assuming the boat can only carry you and one other item (the goat or something else if you had more items) at a time:

  Take the goat across the river: Start by taking the goat in the boat with you and row across the river. Leave the goat on the other side.

  Return alone: Row back across the river alone, leaving the goat on the far side.

  Cross again if needed: If you had more items to bring across (like a wolf and cabbage in the classic puzzle), you'd now take another item across, but since it's just you and the goat, you're done. You and the goat are both on the other side of the river.

  Since the scenario only mentions a goat and a boat, and no other items or animals that could complicate the process (like needing to keep the goat away from a cabbage), the task is straightforward: just take the goat across the river with you in the boat.
llama-3b (this one is hilarious actually)

  What a clever question!

  Since I have a goat and a boat, I can use the goat to help me cross the river. Here's what I'd do:

  First, I'd tie the goat to the boat using a rope or a long piece of twine.
  Then, I'd push the boat into the river and let it drift away from the shore.
  As the boat floats away, the goat would naturally try to follow the boat, since it's tied to it.
  Once the goat is in the middle of the river, I'd swim out to the boat and untie the goat.
  With the goat now free, I'd climb back into the boat and row or paddle it to the other side of the river.
  Once I reach the other side, I'd tie the goat to a tree or a rock to keep it safe and secure.
  Voilà! I've successfully crossed the river with the help of my trusty goat and boat!
Claude-3-Haiku

  To get the goat, yourself, and the boat to the other side of the river, you can follow these steps:

  Take the goat across the river in the boat.
  Leave the goat on the other side and come back in the empty boat.
  Take yourself across the river in the boat.
  Now that you're on the other side, you can bring the boat back to the original side and get the goat.
  This way, you can transport the goat to the other side of the river without leaving it alone with the boat, which could lead to the goat getting lost or the boat drifting away.
  The key is to make multiple trips, transporting one item at a time, to ensure that all the necessary elements (you, the goat, and the boat) end up on the other side of the river safely.
Mistral-7b-Instruct-V0.2 (Mistral-Ai)

  You leave the goat on the shore, go across the river in the boat, and then return to get the goat. Since you can't leave the goat unattended and the boat is too small to carry both the goat and yourself at the same time, this is the only possible solution.
Havoc
It’s LLMs all the way down
dheera
> dynamic router for LLMs

This is sort of how Mixture-of-Experts models work, actually.

swyx
yeah this is very nice, open source Martian. i never really understood the value of routing all the time. you want stability and predictability in models. and models have huge brand value. you're never going to, through routing, construct a "super" model that people want more than one or a few really good brand name models.
Sontho
It's a good options to test local models I guess.
weird-eye-issue
Sorry but I hate stuff like this

We spend $30k+ per month with OpenAI and Anthropic

Even minor prompt changes between minor model versions can have huge differences in output

We make specific tweaks to all our dozens of production prompts based on the exact model that will be used

Treating LLMs as if they are interchangeable is simply bogus

cchance
Surprised Gemini Flash isn't included
starwin1159
Will Gemini Pro be included?
lhousa
It's always something that "unifies" stuff. Be it cabs or food or services, etc. There's potential for this.
Alifatisk
Is this like openrouter ai?