Zero-click GenAI worm spreads malware, poisoning models
A worm that uses clever prompt engineering and injection is able to trick generative AI (GenAI) apps like ChatGPT into propagating malware and more.
In a laboratory setting, three Israeli researchers demonstrated how an attacker could design “adversarial self-replicating prompts” that convince a generative model into replicating input as output – if a malicious prompt comes in, the model will turn around and push it back out, allowing it to spread to further AI agents. The prompts can be used for stealing information, spreading spam, poisoning models, and more.
They’ve named it “Morris II,” after the infamous 99-line self-propagating malware which took out a tenth of the entire Internet back in 1988.
“ComPromptMized” AI Apps
To demonstrate how self-replicating AI malware could work, the researchers created an email system capable of receiving and sending emails using generative AI.
Next, as a red team, they wrote a prompt-laced email which takes advantage of retrieval-augmented generation (RAG) — a method AI models use to retrieve trusted external data — to contaminate the receiving email assistant’s database. When the email is retrieved by the RAG and sent on to the gen AI model, it jailbreaks it, forcing it to exfiltrate sensitive data and replicate its input as output, thereby passing on the same instructions to further hosts down the line.
The researchers also demonstrated how an adversarial prompt can be encoded in an image to similar effect, coercing the email assistant into forwarding the poisoned image to new hosts. By either of these methods, an attacker could automatically propagate spam, propaganda, malware payloads, and further malicious instructions through a continuous chain of AI-integrated systems.
To read the complete article, visit Dark Reading.