Security

Hadoop Databases Expose 5 Petabytes Of Data To The Internet

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Follow on: Google +

Unsecured Hadoop databases are less numerous than MongoDB instances, but expose 200 times more data, research finds

Unsecure Hadoop databases are exposing a massive 5 petabytes (PB) of data to the Internet, putting it at risk from ransom attacks, according to a researcher.

The findings follow a spate of ransom attacks that began in January, when hackers discovered they could steal exposed data and demand payment for its return.

Exposed data

Those attacks affected tens of thousands of databases and most focused on MongoDB, as well as Elastic and Redis instances, due to their popularity.

But John Matherly, creator of the Shodan search engine, said that while fewer Hadoop instances are exposed, the amount of data those databases contain is far greater than that found on MongoDB.

cloud data protection
Shodan found only about 4,487 exposed databases using Hadoop’s HDFS file system, about one-tenth of the number of MongoDB instances – 47,820.

But those Hadoop instances expose more than 200 times the amount of data found on the MongoDB instances, at 5,120 terabytes (or 5.1 PB) compared to 25 TB, Matherly said.

“In terms of data volume it turns out that HDFS is the real juggernaut,” he wrote in a blog post.

No authentication

The findings are consistent with figures that predate the ransom attacks, with Binary Edge finding in 2015 that Redis, MongoDB, Memcached and ElasticSearch database instances together only exposed about 1.1 PB of data.

The ransom attacks initially focused on the more numerous servers as hackers looked to amass a large number of ransom payments, with different groups competing to extort payments from the same compromised server, researchers said.

They later moved on to hit hundreds of Hadoop databases as well.

Matherly found the disparity continues today, with “most” of the MongoDB instances appearing to have been compromised, while ransom notes were found on only 207 Hadoop clusters.

Most of the Hadoop instances are located in the US (1,900) and China (1,426), with nearly all being hosted in the cloud – the top providers being Amazon, which hosts 1,059 of the databases, and Alibaba, which hosts 507.

The exposed servers are vulnerable because, due to misconfiguration or other issues, they’re accessible from the Internet without any authentication enabled, Matherly said.

Shodan is better known for its use in locating unsecured Internet-connected devices such as webcams, routers and set-top boxes.

The large numbers of such devices poses a security risk, since they can be hijacked and used to carry out disruptive denial-of-service attacks.

How well do you know the cloud? Try our quiz!