Gemini is laughably behind the others though slightly better than Meta's current offering.
I find GPT-4 and Claude roughly on par with a slight edge to Claude. But when it fails, it does so catastrophically, whereas GPT-4 normally gives you at least something approaching a reasonable answer.
Another clear difference between them is that GPT-4 has a built-in code interpreter, and Claude doesn't. Some of the questions that GPT-4 gets correct that Claude doesn't are because GPT-4 can actually write code, run it, analyze the results, and self correct. Sometimes to a very impressive degree.
Also, Claude artifacts can display MermaidJS. You can give it a whole codebase and ask something like "generate a sequence diagram of this process" and it does a remarkable job of doing so, displaying results in realtime.
As a simple example from yesterday, we were generating a GitHub Action using GPT-4o and running into an issue.
The GitHub action does a diff of your code against `main` and then passes that diff to an LLM for a simple vulnerability analysis, returning a severity of high|medium|low plus a description.
Both GPT-4o and Claude 3.5 Sonnet kept getting caught up in the same problem, which is that the workflow they generated would log the contents of PR, which itself contained said workflow as YAML. This then caused the YAML to parse incorrectly.
A small example, but the issue was obvious to a developer reading the output. Both LLMs struggled to “understand” the issue, even when they were directly told what to do (they kept logging the diff for debug, which would continue to break it).
We got to a solution with both LLMs, but it required us to be quite explicit in our prompting.
Caveat: both GPT-4o and Claude 3.5 Sonnet were accessed via a self-hosted interface that hits the API directly, so your experience through the UIs may be different.
You can see examples of the prompts it generates in the repository linked below, including a customized version of their prompt generator[2]. This feature significantly improved my experience with LLMs in general, both Claude and local ones, and I now use Claude 3.5 Sonnet for pretty much all my coding, mostly in Go and, lately, Rust.
I mostly use it to improve existing code or to get started, not to generate entire code bases, so I can't say much about that.
I found it lacking in shell scripting though. Nine out of ten times I need to fix the shell script it generates, to a point where I just gave up trying and went back to writing them from scratch.
Can't wait to see how much better Claude 3.5 Opus is, though.
[1]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
But when you have complex code or a very long module, yes, Claude 3.5 performs a lot better.
ChatGPT is much more likely to miss small, but important details that can break the code in unexpected ways. And even when you point out the error to ChatGPT, it will double down on being confident in its solution and output exactly the same code again.
I've got to think Claude 3.5 had much better training data and training methodologies for code because it's much "smarter" at being logical and fitting pieces together into one coherent whole. For any random task, I'd trust Claude at coding 20-30% more than ChatGPT.
(BTW I think with some further AI-assisted work and using the tools interface, you could mostly match all the web interface features in an integration like this. And using API is less brutally rate limited.)
I only checked HN as I started getting service unavailable error pages in the middle of it helping me rewrite some code.
Only really tried it with embedded C and Next.js stuff and it can make a few unsafe mistakes but when I need a helper function it is much quicker than me finding one myself.
The part of the Claude UI that writes code in a sidebar next to the chat is much better than writing code inline like ChatGPT.
Tbh it's making a lot of mistakes when writing with the latest Next.js versions of the new modules.
Maybe because of the knowledge cutoff? Idk
(Or maybe it's a signal to stop coding in JavaScript...)
The way you can upload files to give context to your requests is amazing. Overall, the Claude experience is much better than anything else I've seen so far. I've tried talking to my colleagues about it, but the reception has been cold. I'm surprised to see developers being so uninterested in such an exciting tool.
The only annoying thing is that the UI gets laggy with a long history, even with a powerful machine. I have to bundle everything up and start a new chat, which is easy to do, at least. I also wish there were a better way to "branch" a chat, meaning trying two different approaches from a specific prompt and possibly going back in time.