Researchers seeing little evidence of benefit from co pilots

jchw

This is my personal experience, despite everyone swearing that it's a game changer. I've tried a fair few times now because people swear everything is revolutionary but I find them almost as annoying as helpful, and as many others have noticed you really have to be careful before accepting the code it outputs as correct, the subtly incorrect bits are extremely insidious. An example: in one case I was trying an AI tool to write some reasonably tricky logic for validation and at some point it got something very close to right but flipped part of a conditional. It took me probably 30 minutes to notice even though it should have been pretty obvious.

The best I can say is that the implementation in Jetbrains IntelliJ IDEA is pretty good. It's basically only useful for some repetitive Java boilerplate, but actually that's perfect, it's mindless enough yet easy to validate. It makes me dislike programming in Java a little bit less.

nbbnbb

I don't have any formal data to prove this available without losing anonymity and probably getting sued by my employer but the introduction of them at my organisation correlates directly to a measurable rise in bugs and incidents. From causal analysis, the tools themselves are not directly responsible as such despite having limited veracity, but people trust them and do not do their jobs properly. There is also a mystique around them being the solution for all validation processes which leads to suboptimal attention at the validation stage on the hope that some vendor we already have is going to magically make a problem go away like they said they would at the last conference. I figure at this point the gain might be a negative on a social and human perspective the moment the idea was commercialised.

Urgh. I can't wait to retire.

wanderingbit

This finding bewilders me, because my copilot (I use Sourcegraph’s Cody) has become an essential part of my dev productivity toolset. Being able to get answers to questions that would normally break me out of flow mode by simply Option + C’ing to open up a New Chat has been a productivity boost for me. Getting it to give me little snippets of code that I can use helps keep me in flow mode. Getting it to do a first pass on function comments, which I then edit, has made it much easier to get over the activation energy barrier that usually holds me back from doing full commenting.

I can’t say if the bug count is higher or not. Maybe it is higher in terms of total number of bugs I write throughout my coding session. But if bug count goes up 10% then the speed with which I fix those bugs and get to a final edit of my code is 30% or 40% faster, so the bug count is not the right metric.

Maybe the differentiator is that I am a solo-dev for all this work, and so the negative effects of the copilot are only experienced by me. If I were in a 10 person team, the bugs and the weird out of context code snippets would be magnified by the 9 other people, and the negative effects would be strong. But I don’t know.

thepuppet33r

Genuinely thought this was an article about copilots in planes and was terrified that airlines were going to cut back to one pilot in the cockpit to save a little more money.

pfisherman

Had the chance to watch some non programmers use copilot for data science (using pandas) and it was an eye opening experience. I came away with the feeling that the tool landed in a sort of “uncanny valley” of productivity. If you can’t write the code without copilot then you won’t be able to debug the errors it makes. And if you know enough to spot and debug the errors, then copilot just gets in the way.

taftster

I think an interesting use for copilot would be to ask it to find a bug given the description of an observed behavior. Let's say you're not super familiar with a code base, but yet you have found a bug (or "feature") that should be addressed. Having copilot narrow in on the logical code points to potentially address the issue would be invaluable.

Additionally, I find the copilot code suggestions during code reviews / pull requests sometimes useful. At times, it can offer some insightful bits about a code segment, such as potential exception handling fixes, etc.

I'd like to explore having copilot write unit tests, including representative test data, that can execute edge code paths. I haven't done this yet, but this seems exactly the type of thing that a "copilot" would do for me (not too unlike paired-programming, maybe).

Having a copilot completely write my code base, that's another thing entirely. There would be too much going back and verifying that it got it right. And additionally, I've seen it completely conjure up bogus solutions as well. For example, I've had copilot offer a configuration change that was completely fabricated; it looked legitimate enough that a senior systems engineer attempted to install/deliver the "fix" it offered when the suggestion was completely made up.

Overall, I guess my experience with copilot is not much different than working with any human. Trust but verify.

mxxx

The thing that I’ve seen my team use it most for is explaining blocks of code. We maintain a bunch of legacy systems that don’t get touched often and are written in stacks that our engineers aren’t completely fluent with, and it can be helpful when they’ve traced an issue to a particular function but the original intent or purpose of the code is obtuse.

marcinzm

Cursor IDE with Claude 3.5 has been very beneficial for me in terms of productivity. Others a lot less so.

Eisenstein

> “Using LLMs to improve your productivity requires both the LLM to be competitive with an actual human in its abilities

No it does not. Does an assistant have to be as qualified as their boss?

> “The LLM does not possess critical thinking, self-awareness, or the ability to think.”

This is completely irrelevant. The LLM can understand your instructions and it can type 30,000 times faster than you.

gtvwill

Eh common theme amongst coders but I feel like it's less the LLM and more pebkac. You have a new tool, it can be hugely productive. You just need to use it right. Stop expecting it to write your whole app or create new formulas for hyper complex problems that haven't yet been solved or aren't common. It's a reference tool that's better than reading the docs or browsing stack overflow. Ask it for snippets of code, break up your tasks, use it to compare a number of methods that achieve the same result, discuss with it different approaches to the same problem.

  Much like how a nailgun won't just magically build you a house, it'll just let you build one quicker.

I get great benefit out of llms for coding. I'm not a good coder. But I am decent at planning and understanding what I want. Llms get me there 100x quicker than not using them. I don't need 4 years of cs to learn all the tedious algos for sorting or searching. I just ask an ai for a bunch of examples, assess them for what they are and get on with it. It can tell me the common pros and cons of it all and much like any other decision in business I make my best judgement and go with it.

Need to sort a heap of data into x y z or convert it from x to y? Llm will show me the way, now I don't need to hire someone to do it for me.

But alas, so many seem to think a language interpretation tool is actually a do it all one stop shop of production. Pebkac, your using the tool wrong.