Can privacy-preserving machine learning overcome data-sharing worries?
Privacy-preserving AI techniques could allow researchers to extract insights from sensitive data if cost and complexity barriers can be overcome. But as the concept of privacy-preserving artificial intelligence matures, so do data volumes and complexity. This year, the size of the digital universe could hit 44 zettabytes, according to the World Economic Forum. That sum is 40 times more bytes than the number of stars in the observable universe. And by 2025, IDC projects that number could nearly double.
More Data, More Privacy Problems
While the explosion in data volume, together with declining computation costs, has driven interest in artificial intelligence, a significant portion of data poses potential privacy and cybersecurity questions. Regulatory and cybersecurity issues concerning data abound. AI researchers are constrained by data quality and availability. Databases that would enable them, for instance, to shed light on common diseases or stamp out financial fraud — an estimated $5 trillion global problem — are difficult to obtain. Conversely, innocuous datasets like ImageNet have driven machine learning advances because they are freely available.
A traditional strategy to protect sensitive data is to anonymize it, stripping out confidential information. “Most of the privacy regulations have a clause that permits sufficiently anonymizing it instead of deleting data at request,” said Lisa Donchak, associate partner at McKinsey.
But the catch is, the explosion of data makes the task of re-identifying individuals in masked datasets progressively easier. The goal of protecting privacy is getting “harder and harder to solve because there are so many data snippets available,” said Zulfikar Ramzan, chief technology officer at RSA.
The Internet of Things (IoT) complicates the picture. Connected sensors, found in everything from surveillance cameras to industrial plants to fitness trackers, collect troves of sensitive data. With the appropriate privacy protections in place, such data could be a gold mine for AI research. But security and privacy concerns stand in the way.
Addressing such hurdles requires two things. First, a framework providing user controls and rights on the front-end protects data coming into a database. “That includes specifying who has access to my data and for what purpose,” said Casimir Wierzynski, senior director of AI products at Intel. Second, it requires sufficient data protection, including encrypting data while it is at rest or in transit. The latter is arguably a thornier challenge.
To read the complete article, visit IoT World Today.