Ask HN: What is your experience with Claude 3.5?

Zealotux

I am properly blown away by it. I've been using it for some pretty non-trivial coding, like shaders and game-related algorithms, and it's truly amazing. It's very good at breaking things down, explaining concepts (different approaches with pros and cons, for example), and quickly fixing errors with just one or two more prompts.

The way you can upload files to give context to your requests is amazing. Overall, the Claude experience is much better than anything else I've seen so far. I've tried talking to my colleagues about it, but the reception has been cold. I'm surprised to see developers being so uninterested in such an exciting tool.

The only annoying thing is that the UI gets laggy with a long history, even with a powerful machine. I have to bundle everything up and start a new chat, which is easy to do, at least. I also wish there were a better way to "branch" a chat, meaning trying two different approaches from a specific prompt and possibly going back in time.

CSMastermind

I run all my queries through all of the SOTA models so that I can do a direct comparison. Currently, I've run over 600 queries through all the models. Claude has a clearly better answer than GPT-4 around 6% of the time and GPT-4 has a clearly better answer than Claude around 4% of the time.

Gemini is laughably behind the others though slightly better than Meta's current offering.

I find GPT-4 and Claude roughly on par with a slight edge to Claude. But when it fails, it does so catastrophically, whereas GPT-4 normally gives you at least something approaching a reasonable answer.

Another clear difference between them is that GPT-4 has a built-in code interpreter, and Claude doesn't. Some of the questions that GPT-4 gets correct that Claude doesn't are because GPT-4 can actually write code, run it, analyze the results, and self correct. Sometimes to a very impressive degree.

causal

If you haven't yet, pay for the upgrade to use Projects. I've uploaded entire codebases, then opened different conversations about that codebase, generating very useful artifacts.

Also, Claude artifacts can display MermaidJS. You can give it a whole codebase and ask something like "generate a sequence diagram of this process" and it does a remarkable job of doing so, displaying results in realtime.

jumploops

We’ve noticed a similar quality to GPT-4o, with the same level of prompting needed to get the same output for harder tasks.

As a simple example from yesterday, we were generating a GitHub Action using GPT-4o and running into an issue.

The GitHub action does a diff of your code against `main` and then passes that diff to an LLM for a simple vulnerability analysis, returning a severity of high|medium|low plus a description.

Both GPT-4o and Claude 3.5 Sonnet kept getting caught up in the same problem, which is that the workflow they generated would log the contents of PR, which itself contained said workflow as YAML. This then caused the YAML to parse incorrectly.

A small example, but the issue was obvious to a developer reading the output. Both LLMs struggled to “understand” the issue, even when they were directly told what to do (they kept logging the diff for debug, which would continue to break it).

We got to a solution with both LLMs, but it required us to be quite explicit in our prompting.

Caveat: both GPT-4o and Claude 3.5 Sonnet were accessed via a self-hosted interface that hits the API directly, so your experience through the UIs may be different.

jamesponddotco

Pretty positive, much better than my experience with OpenAI's models. The killer feature for me is their prompt generator[1], which you can use to create system prompts or improve user prompts. As I said in another thread, the generator is tuned to generate prompts better suitable for Claude, which improves responses.

You can see examples of the prompts it generates in the repository linked below, including a customized version of their prompt generator[2]. This feature significantly improved my experience with LLMs in general, both Claude and local ones, and I now use Claude 3.5 Sonnet for pretty much all my coding, mostly in Go and, lately, Rust.

I mostly use it to improve existing code or to get started, not to generate entire code bases, so I can't say much about that.

I found it lacking in shell scripting though. Nine out of ten times I need to fix the shell script it generates, to a point where I just gave up trying and went back to writing them from scratch.

Can't wait to see how much better Claude 3.5 Opus is, though.

[1]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

[2]: https://sr.ht/~jamesponddotco/llm-prompts/

panphora

They're both great for small, highly-focused features.

But when you have complex code or a very long module, yes, Claude 3.5 performs a lot better.

ChatGPT is much more likely to miss small, but important details that can break the code in unexpected ways. And even when you point out the error to ChatGPT, it will double down on being confident in its solution and output exactly the same code again.

I've got to think Claude 3.5 had much better training data and training methodologies for code because it's much "smarter" at being logical and fitting pieces together into one coherent whole. For any random task, I'd trust Claude at coding 20-30% more than ChatGPT.

xpasky

I was so blown away I just wrote a chat-based integration to vim / neovim (https://github.com/pasky/claude.vim) for it - actually, 95% of the 600 LoC is written by Claude. It is finally on a level where "pair programmer" becomes a way better UX than the "code completion" paradigm.

(BTW I think with some further AI-assisted work and using the tools interface, you could mostly match all the web interface features in an integration like this. And using API is less brutally rate limited.)

MightySCollins

I have really been enjoying using it. It seems very good at handling simple but time consuming problems although I find longer chats it seems to forget about the previous code and gets muddled. I am using it in Zed and also with a Claude account as I find I can hit the daily limit when using it for a few hours.

I only checked HN as I started getting service unavailable error pages in the middle of it helping me rewrite some code.

Only really tried it with embedded C and Next.js stuff and it can make a few unsafe mistakes but when I need a helper function it is much quicker than me finding one myself.

makk

I haven’t been blown away by its coding as compared to 4o. It feels similar. I use it via the chat interface and copy/paste. I’d prefer to use it directly in IntelliJ but haven’t found a plugin that doesn’t get in my way to the point of being too distracting.

The part of the Claude UI that writes code in a sidebar next to the chat is much better than writing code inline like ChatGPT.

TradingPlaces

Pro user. I use it primarily for stock analysis. I feel like it is a step up from 3.0 Opus in “intelligence,” but a massive improvement on speed when I fill up the context window with documents, which is often. Looking forward to the Opus variant. FWIW it writes R and pandas code pretty well, but I haven’t used that extensively.

namanyayg

I've been programming for 10y+ but now I completely rely on AI for coding (using the Cursor IDE). It's made me easily 4-10x faster.

Tbh it's making a lot of mistakes when writing with the latest Next.js versions of the new modules.

Maybe because of the knowledge cutoff? Idk

(Or maybe it's a signal to stop coding in JavaScript...)

glimshe

It depends. I use the free models and most tasks are better on 4o, but I fairly often get better results with Claude. I often try the same task on both when I'm not satisfied with the first answer I get.

replete

My account got banned after using it a couple of times. Not sure why, as I tried a few code related prompts. I intended to upgrade but I can't login and support isn't going anywhere.

vithalreddy

This 100% true, specially for programming related code it works on first try without any special prompt.

low_tech_punk

Any one has compared it with gpt-4o specifically? And what language(s) have you tried?

marvstazar

How does it compare to Github Copilot?

runjake

How is it with Python in comparison?

DataDaemon

Claude is not as lazy as GPT.

beaugunderson

i'll keep trying new models as they're updated but 3.5 sonnet is hot garbage. not worth the power to train and run it.