mythz
Claude 3.5 Sonnet still holds the LLM crown for code which I'll use when wanting to check the output of the best LLM, however my Continue Dev, Aider and Claude Dev plugins are currently configured to use DeepSeek Coder V2 236B (and local ollama DeepSeek Coder V2 for tab completions) as it offers the best value at $0.14M/$0.28M which sits just below Claude 3.5 Sonnet on Aider's leaderboard [1] whilst being 43x cheaper.

[1] https://aider.chat/docs/leaderboards/

anotherpaulg
Yi-Coder scored below GPT-3.5 on aider's code editing benchmark. GitHub user cheahjs recently submitted the results for the 9b model and a q4_0 version.

Yi-Coder results, with Sonnet and GPT-3.5 for scale:

  77% Sonnet
  58% GPT-3.5
  54% Yi-Coder-9b-Chat
  45% Yi-Coder-9b-Chat-q4_0
Full leaderboard:

https://aider.chat/docs/leaderboards/

Palmik
The difference between (A) software engineers reacting to AI models and systems for programming and (B) artists (whether it's painters, musicians or otherwise) reacting to AI models for generating images, music, etc. is very interesting.

I wonder what's the reason.

theshrike79
> Continue pretrained on 2.4 Trillion high-quality tokens over 52 major programming languages.

I'm still waiting for a model that's highly specialised for a single language only - and either a lot smaller than these jack of all trades ones or VERY good at that specific language's nuances + libraries.

JediPig
I tested this out on my workload ( SRE/Devops/C#/Golang/C++ ). it started responding about non-sense on a simple write me boto python script that changes x ,y,z value.

Then I tried other questions in my past to compare... However, I believe the engineer who did the LLM, just used the questions in benchmarks.

One instance after a hour of use ( I stopped then ) it answered one question with 4 different programming languages, and answers that was no way related to the question.

mtrovo
I'm new to this whole area and feeling a bit lost. How are people setting up these small LLMs like Yi-Coder locally for tab completion? Does it work natively on VSCode?

Also for the cloud models apart from GitHub Copilot, what tools or steps are you all using to get them working on your projects? Any tips or resources would be super helpful!

smcleod
Weird they're comparing it to really old deepseek v1 models, even v2 has been out a long time now.
kleiba
What is the recommended hardware to run a model like that locally on a desktop PC?
NKosmatos
It would be good if LLMs were somehow packaged in an easy way/format for us "novice" (ok I mean lazy) users to try them out.

I'm not interested so much with the response time (anyone has a couple of spare A100s?), but it would be good to be able to try out different LLMs locally.

gloosx
Can someone explain these Aider benchmarks to me? They pass same 113 tests through llm every time. Why they then extrapolate ability of llm to pass these 113 basic python challenges to the general ability to produce/edit code? For me it sounds like this or that model is 70% accurate in solving same hundred python training tasks, but why does it mean that it's good at other languages and arbitrary, private tasks as well? Does anyone ever tried to change them test cases or wiggle conditions a bit to see if it will still hit 70%?
smokel
Does anyone know why the sizes of these models are typically expressed in number of weights (i.e 1.5B and 9B in this case), without mentioning the weight size in bytes?

For practical reasons, I often like to know how much GPU RAM is required to run these models locally. The actual number of weights seems to only express some kind of relative power, which I doubt is relevant to most users.

Edit: reformulated to sound like a genuine question instead of a complaint.

nathan_tarbert
This sounds really cool! I found this Reddit discussion... https://www.reddit.com/r/ArtificialInteligence/comments/1f9m...
Tepix
Sounds very promising!

I hope that Yi-Coder 9B FP16 and Q8 will be available soon for Ollama, right now i only see the 4bit quantized 9B model.

I'm assuming that these models will be quite a bit better than the 4bit model.

patrick-fitz
I'd be interested to see how it performs on https://www.swebench.com/

Using SWE-agent + Yi-Coder-9B-Chat.

cassianoleal
Is there an LLM that's useful for Terraform? Something that understands HCL and has been trained on the providers, I imagine.
Havoc
Beats deepseek 33. That’s impressive
lasermike026
First look seem good. I'll keep hacking with it.
ziofill
Are coding LLMs trained with the help of interpreters?
zeroq
Everytime someone tells how AI 10x his programming capabilities I'm like "tell me you're bad at coding without telling me".