ChatGPT spills secrets in novel PoC attack

Written by Jai Vijayan / Dark Reading
14th March 2024

A team of researchers from Google DeepMind, Open AI, ETH Zurich, McGill University, and the University of Washington have developed a new attack for extracting key architectural information from proprietary large language models (LLM) such as ChatGPT and Google PaLM-2.

The research showcases how adversaries can extract supposedly hidden data from an LLM-enabled chat bot so they can duplicate or steal its functionality entirely. The attack — described in a technical report released this week — is one of several over the past year that have highlighted weaknesses that makers of AI tools still need to address in their technologies even as adoption of their products soar.

Extracting Hidden Data

As the researchers behind the new attack note, little is known publicly of how large language models such as GPT-4, Gemini, and Claude 2 work. The developers of these technologies have deliberately chosen to withhold key details about the training data, training method, and decision logic in their models for competitive and safety reasons.

“Nevertheless, while these models’ weights and internal details are not publicly accessible, the models themselves are exposed via APIs,” the researchers noted in their paper. Application programming interfaces allow developers to integrate AI-enabled tools such as ChatGPT into their own applications, products, and services. The APIs allow developers to harness AI models such as GPT-4, GPT-3, and PaLM-2 for several use cases such as building virtual assistants and chatbots, automating business process workflows, generating content, and responding to domain-specific content.

The researchers from DeepMind, OpenAI, and the other institutions wanted to find out what information they could extract from AI models by making queries via its API. Unlike a previous attack in 2016 where researchers showed how they could extract model data by running specific prompts at the first or input layer, the researchers opted for what they described as a “top-down” attack model. The goal was to see what they could extract by running targeted queries against the last or final layer of the neural network architecture responsible for generating output predictions based on input data.

To read the complete article, visit Dark Reading.

Artificial Intelligence

Partner content

ChatGPT spills secrets in novel PoC attack

Extracting Hidden Data

Leave a comment Cancel reply

Commentary

September 3GPP Plenary: 6G officially begins, Release 19 on track

Latest 3GPP standards-development work includes new classes of HPUE, progress on mission-critical services

Is LMR the best solution for first responders? Should 4.9 GHz license go to the FirstNet Authority?

Video

Check out key takeaways from Disaster Management Symposium webinars, register to view archives

Video: Opening of the Expo Hall on day three of IWCE 2023

Lynk CEO Charles Miller: The sky’s the limit

Events

UC Ezines

IWCE 2019 Wrap Up

Artificial Intelligence

Partner content

ChatGPT spills secrets in novel PoC attack

Extracting Hidden Data

Most Recent

Leave a comment Cancel reply

Related Content

Commentary

September 3GPP Plenary: 6G officially begins, Release 19 on track

Latest 3GPP standards-development work includes new classes of HPUE, progress on mission-critical services

Is LMR the best solution for first responders? Should 4.9 GHz license go to the FirstNet Authority?

Video

Check out key takeaways from Disaster Management Symposium webinars, register to view archives

Video: Opening of the Expo Hall on day three of IWCE 2023

Lynk CEO Charles Miller: The sky’s the limit

Events

UC Ezines

IWCE 2019 Wrap Up