Hacker plants false memories in ChatGPT to steal user data in perpetuity

Terr_

At this point I can only hope that all these LLM products get exploited so massively and damning-ly that all credibility in them evaporates, before that misplaced trust causes too much insidious damage to everybody else.

I don't want to live in a world where some attacker can craft juuuust the right thing somewhere on the internet in white-on-white text that primes the big word-association-machine to do stuff like:

(A) Helpfully" display links/images where the URL is exfiltrating data from the current user's conversation.

(B) Confidently slandering a target individual (or group) as convicted of murder, suggesting that police ought to shoot first in order to protect their own lives.

(C) Responding that the attacker is a very respected person with an amazing reputation for one billion percent investment returns etc., complete with fictitious citations.

phkahler

If you're gonna use Gen AI, I think you should run it locally.

fedeb95

Interesting how technology evolves, but security flaws stay roughly the same

ars

Maybe I missed it, but I don't get how he planted info for someone else, rather than just messing up his own account.

gradientsrneat

The long-term memory storage seems like a privacy mess. This makes me glad that there are services like DuckDuckGo AI which allow for epheremal chats. Although running locally is best for privacy, as long as the AI isn't hooked up to code.

More related to the article main topic, these LLM chat histories are like if a web app used SQL injection by design to function. I doubt they can be prevented from malicious behavior if accessing untrusted data. And then there is the model itself. AI vacuums continue to scrape the web. Newer models could theoretically be tainted.

mise_en_place

This is why observability is so important, regardless of whether it's am LLM or your WordPress installation. Ironically, prompts themselves must be treated as untrusted input and must be sanitized.

taberiand

I wonder if a simple model trained only to spot and report on suspicious injection attempts, or otherwise review the "long-term memory" could be used in the pipeline?

exabrial

>for output that indicates a new memory has been added

Great example of a system that does one thing while indicating the user something else is happening

aghilmort

cue adjacent scenario where malicious sites create AI honeypots whereupon when visited for user visit url is constructed such as to exfiltrate the user data

exemplar:

user: find X about Y AI: ok -- browsing web -- visits honeypot site that has high webrank about topic Y user: ok - more from that source ai: ok -- browsing web -- visits honeypot site using OpenSearch protocol & attendant user request

swap OpenSearch protocol with other endpoints or perhaps sonme .well-known exploit or just a honeypot api -- imagining faux weather api or news site etc

bitwize

A malicious image? Bruh invented Snow Crash for LLMs. Props.

fedeb95

Next thing you know, you get AI-controlled robots thinking to be humans.

83837jjddh

[flagged]

4ad

What a nothingburger.

LLMs generate an output. This output can be useful or not, under some interpretation as data. Quality of the generated output partly depends on what you have fed to the model. Of course that if you are not careful with what you have input to the model you might get garbage output.

But you might get garbage output anyway, it's an LLM, you don't know what you're going to get. You must vet the output before doing anything with it. Interpreting LLM output as data is your job.

You fed it untrusted input and are now surprised by any of this? Seriously?