Another AI pitfall: Digital mirroring opens new cyberattack vector
“Digital twins” — AI assistants trained to service our many needs by learning about and in some ways mimicking us — turn out to have myriad ways they might be turned against us.
Ben Sawyer, a professor at the University of Central Florida, and Matthew Canham, CEO of Beyond Layer Seven, note that despite the furor over how large language models (LLMs) will enable hackers to design more, better phishing emails, vishing calls, and bots, that kind of thing is old hat.
“It’s not in the future, it’s in the past,” Sawyer explains. “Can an LLM write a phishing email? Yes, and it’s been able to since before ChatGPT took the world’s attention. Can it do a lot more? That’s what we’re really interested in.”
Sawyer and Canham will be doing a deeper dive on AI exploitation of humans and their data during Black Hat USA next month in Las Vegas.
How LLMs Can Be Hacked
Already there is plenty of discourse surrounding the insecurity of LLMs, as researchers and attackers alike experiment with how they can be broken and manipulated.
“There are a number of layers at which you can attack the technology,” Sawyer explains. “It can be impacted during the process through which it’s trained, by playing with the data that feeds it. And it can be impacted afterwards by other types of later training, and prompts,” using the AI’s own powers against itself.
By contrast, defending against LLM compromise — or even finding out that something is wrong in the first place — is far more difficult to imagine. “The problem is it’s too complex to audit the entire space. Nobody can go through everything ChatGPT might say and check it,” Sawyer says.
An attacker might use a compromised LLM to access sensitive data about its users, or write more convincing phishing emails. But Sawyer and Canham are already looking past those kinds of use cases.
To read the complete article, visit Dark Reading.