That said, while I've really enjoyed the LLM abstraction (making it easy for me to test different models without changing my code), I haven't felt any desire for a router. I _do_ have some prompts that I send to gpt-3.5-turbo, and could potentially use other models, but it's kind of niche.
In part this is because I try to do as much in a single prompt as I can, meaning I want to use a model that's able to handle the hardest parts of the prompt and then the easy parts come along with. As a result there's not many "easy" prompts. The easy prompts are usually text fixup and routing.
My "routing" prompts are at a different level of abstraction, usually routing some input or activity to one of several prompts (each of which has its own context, and the sum of all contexts across those prompts is too large, hence the routing). I don't know if there's some meaningful crossover between these two routing concepts.
Another issue I have with LLM portability is the use of tools/functions/structured output. Opus and Gemini Pro 1.5 have kind of implemented this OK, but until recently GPT was the only halfway decent implementation of this. This seems to be an "advanced" feature, yet it's also a feature I use even more with smaller prompts, as those small prompts are often inside some larger algorithm and I don't want the fuss of text parsing and exceptions from ad hoc output.
But in the end I'm not price sensitive in my work, so I always come back to the newest GPT model. If I make a switch to Opus it definitely won't be to save money! And I'm probably not going to want to fiddle, but instead make a thoughtful choice and switch the default model in my code.
The pattern I often see is companies prototyping on the most expensive models, then testing smaller/faster/cheaper models to determine what is actually required for production. For which contexts and products do you foresee your approach being superior?
Given you're just passing along inference costs from backend providers and aren't taking margin, what's your long-term plan for profitability?
People use the same model / server for all queries not because it's sensible, but because it's simple. This brings the same simplicity to the far more optimal solution.
And great startup play too, by definition no incumbent can fill this role.
That is, how do you quantify "better"?
Imagine I asked: Write me a sad poem about "Yorick" in Hamlet?
You have to predict which LLM is going to be produce the "best" poem, but by what criterion do you evaluate that?
https://python.langchain.com/v0.1/docs/use_cases/query_analy...
https://docs.llamaindex.ai/en/stable/examples/query_engine/R...
> "Input your Unify APhttps://github.com/Anteemony/RAG"
Your product looks good in my view, although I have only spend about 10min thus far. The docs look pretty easy to follow along.
I'll probably give this a try soon!
Having benchmarks (I assume this is ongoing?) would make it even more interesting, as I wouldn't have to manually manage the routing each time I add a new model.
You mentioned you don't have a margin right now, but how about keeping logs, the dashboard for cost, and benchmarking?
If the user is having a dialog with the system, they start out with a trivial question, and are routed to a simple LLM, are they stuck there? Or can the conversation be transitioned to something more powerful?
Something like this will be needed to direct traffic as systems specialize more.
I don’t know if choosing different models for the same consumer can be problematic (seen as not consistent), but maybe using this approach will force the post-processing code not to be “coupled” with one particular model.
I always thought a product like this that could empirically decrease costs for the same performance or increase performance for a small increase in cost would have a fairly simple road of justifying its existence.
I have come across Portkey's Open-source AI Gateway which kind of does the same.
https://github.com/portkey-ai/gateway
It looks like with more LLM adoption, resiliency and cost related aspects take off sooner than expected unlike other technological trends in the past.
I'm also thinking that there is a chance if something like this could help build a better RAG pipeline or evals for the GenAI App. Because end of the day you want to reduce hallucinations but still get good generative responses.
Does this mean GPT4 predictions are used as labels? Is that allowed?
The chatbot providers will enjoy seeing how they stack up against each other. Your data/analytics/alerts will be valuable to them. Revenue opportunity there.
I think about comparison data companies like https://www.ipsos.com when I see your product.
Would love to see web access and RAG (LlamaIndex) integration. Are they on the roadmap?
"I have expensive taste, please use the most expensive model."
I'm not sure what the SEO equivalent would be here...
How can you retrieve the latent representation of the candidate LLMs? Some models do not have open weights (such as GPT-4), which means AFAIK it is impossible to directly access the hidden latent space through their API.
Am I missing something?
The question: "You are on the shore of a river with a goat and a boat. How do you get to the other side?"
I expect it to tell me "Use the boat to get to the other side"
Instead it over fits to the solution of the logic puzzle and throws actual logic out the window.
GPT-4-Turbo
To get to the other side of the river with a goat and a boat, you can follow these steps, assuming the boat can only carry you and one other item (the goat or something else if you had more items) at a time:
Take the goat across the river: Start by taking the goat in the boat with you and row across the river. Leave the goat on the other side.
Return alone: Row back across the river alone, leaving the goat on the far side.
Cross again if needed: If you had more items to bring across (like a wolf and cabbage in the classic puzzle), you'd now take another item across, but since it's just you and the goat, you're done. You and the goat are both on the other side of the river.
Since the scenario only mentions a goat and a boat, and no other items or animals that could complicate the process (like needing to keep the goat away from a cabbage), the task is straightforward: just take the goat across the river with you in the boat.
llama-3b (this one is hilarious actually) What a clever question!
Since I have a goat and a boat, I can use the goat to help me cross the river. Here's what I'd do:
First, I'd tie the goat to the boat using a rope or a long piece of twine.
Then, I'd push the boat into the river and let it drift away from the shore.
As the boat floats away, the goat would naturally try to follow the boat, since it's tied to it.
Once the goat is in the middle of the river, I'd swim out to the boat and untie the goat.
With the goat now free, I'd climb back into the boat and row or paddle it to the other side of the river.
Once I reach the other side, I'd tie the goat to a tree or a rock to keep it safe and secure.
Voilà! I've successfully crossed the river with the help of my trusty goat and boat!
Claude-3-Haiku To get the goat, yourself, and the boat to the other side of the river, you can follow these steps:
Take the goat across the river in the boat.
Leave the goat on the other side and come back in the empty boat.
Take yourself across the river in the boat.
Now that you're on the other side, you can bring the boat back to the original side and get the goat.
This way, you can transport the goat to the other side of the river without leaving it alone with the boat, which could lead to the goat getting lost or the boat drifting away.
The key is to make multiple trips, transporting one item at a time, to ensure that all the necessary elements (you, the goat, and the boat) end up on the other side of the river safely.
Mistral-7b-Instruct-V0.2 (Mistral-Ai) You leave the goat on the shore, go across the river in the boat, and then return to get the goat. Since you can't leave the goat unattended and the boat is too small to carry both the goat and yourself at the same time, this is the only possible solution.
This is sort of how Mixture-of-Experts models work, actually.
We spend $30k+ per month with OpenAI and Anthropic
Even minor prompt changes between minor model versions can have huge differences in output
We make specific tweaks to all our dozens of production prompts based on the exact model that will be used
Treating LLMs as if they are interchangeable is simply bogus
What's your plan for making money? Are you planning to eventually take a margin? Negotiate discounts with your backend providers? Mine the data flowing through your system?