Jensson
Many didn't get it last time that this model is a generic voice to voice model, it doesn't have a set of voices it can do it can do all sorts of voices and background noises.

That makes it entirely different from text to speech models we had previously, this model when uncensored could do all sorts of voice acting etc for games. But this example shows why they try to neuter it so hard, because it would spook a ton of people in its raw state.

skybrian
It’s “unexpected” because their early training didn’t get rid of it as well as they hoped. LLM’s are good at detecting patterns and like to continue the pattern. They’re starting with autocomplete for voice and training it to do something else.

For now, it’s fairly harmless since it’s only a blooper in a lab, but there will likely be open-weights versions of this sort of thing eventually. And there will probably be people who argue that it’s a good thing, somehow.

sigmoid10
This problem appeared during pre-release testing and has since been solved post-generation using an output classifier that verifies responses, according to the system card release. It was predictable that someone would spin this into a black mirror-esque clickbait story.
golergka
This looks extremely similar to what often happened with chat implementations in 3 and 3.5 era: GPT world generate it's answer and then go on and generate the next input by the user as well.
sweca
Despite it being creepy, this is super cool from a technical standpoint.
peddling-brink
So Open AI has the capability to deep fake any of its users that use the voice chat capability? Yikes.
iJohnDoe
This isn’t an example of general intelligence. However, it’s an example of the complexity of these systems. As they get more complex (and advanced) it’s going to be pretty scary (and creepy) what can happen. I’ll predict there will be moments when we’re truly convinced some independent autonomy has taken over.
superultra
Early on in the voice public release, I asked my 15 year old to give it a shot. They (they’re NB) had a long meandering conversation about their favorite book series, the Percy Jackson series.

At about 15 mins into the conversation between my kiddo and ChatGPT, the model started to take on the vocal mannerisms of my kiddo. It started using more “umms” and “you knows.”

At first this felt creepy but as I explained it to my kid, it’s because their own text has become weighted enough in the token count for the LLM to start incorporating or/and somewhere in the embedded prompts is “empathize with the user and emphasize clarity” and that prompting meant mirroring back speech styles.

This is exactly the same as that only with audio.

system2
Other than being randomly happening, it is not a threat. ElevenLabs does it in a few seconds already. If you are using mic while using Ai, expect companies like OpenAi to steal every bit of your soul. Nothing new here.
AISnakeOil
So basically the model took user data and used it as training data in realtime? Big if true.
bjt12345
Interesting that they seem to think it was caused by glitch tokens through the Audio.
woodpanel
It’s worthwhile to note that any hiccup resembling even something remotely close would have been a GDPR clusterf-ck – for normal web apps.

Software is audible, AI models it seems aren't even attempted of being held accountable

xyst
Now all we need to get is a 3D printer that can print a mask of anybody’s face and now we can have MI style operations

https://youtu.be/v1Y4CubBi60 (5:30)

Imagine a world where world dictators are replaced rather than killed. Rollback dictatorship over years, install democratic process, then magically commit seppuku in a plane crash.

Brilliant. What could go wrong? /s

surfingdino
Someone needs to tell OpenAI marketing that the mood is changing and creepy features like this one may be used against AI.