Categories: Security

Hadoop Databases Expose 5 Petabytes Of Data To The Internet

Unsecure Hadoop databases are exposing a massive 5 petabytes (PB) of data to the Internet, putting it at risk from ransom attacks, according to a researcher.

The findings follow a spate of ransom attacks that began in January, when hackers discovered they could steal exposed data and demand payment for its return.

Exposed data

Those attacks affected tens of thousands of databases and most focused on MongoDB, as well as Elastic and Redis instances, due to their popularity.

But John Matherly, creator of the Shodan search engine, said that while fewer Hadoop instances are exposed, the amount of data those databases contain is far greater than that found on MongoDB.


Shodan found only about 4,487 exposed databases using Hadoop’s HDFS file system, about one-tenth of the number of MongoDB instances – 47,820.

But those Hadoop instances expose more than 200 times the amount of data found on the MongoDB instances, at 5,120 terabytes (or 5.1 PB) compared to 25 TB, Matherly said.

“In terms of data volume it turns out that HDFS is the real juggernaut,” he wrote in a blog post.

No authentication

The findings are consistent with figures that predate the ransom attacks, with Binary Edge finding in 2015 that Redis, MongoDB, Memcached and ElasticSearch database instances together only exposed about 1.1 PB of data.

The ransom attacks initially focused on the more numerous servers as hackers looked to amass a large number of ransom payments, with different groups competing to extort payments from the same compromised server, researchers said.

They later moved on to hit hundreds of Hadoop databases as well.

Matherly found the disparity continues today, with “most” of the MongoDB instances appearing to have been compromised, while ransom notes were found on only 207 Hadoop clusters.

Most of the Hadoop instances are located in the US (1,900) and China (1,426), with nearly all being hosted in the cloud – the top providers being Amazon, which hosts 1,059 of the databases, and Alibaba, which hosts 507.

The exposed servers are vulnerable because, due to misconfiguration or other issues, they’re accessible from the Internet without any authentication enabled, Matherly said.

Shodan is better known for its use in locating unsecured Internet-connected devices such as webcams, routers and set-top boxes.

The large numbers of such devices poses a security risk, since they can be hijacked and used to carry out disruptive denial-of-service attacks.

How well do you know the cloud? Try our quiz!

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Boeing Starliner Launches Successfully, On Route To International Space Station

Boeing's crewless space taxi, CST-100 Starliner, one step closer to NASA certification, as it enters…

2 days ago

Apple Accused By Union Of Staff Law Violations At NY Store

Staff at Apple's World Trade Centre store in New York are allegedly being questioned and…

2 days ago

Canada To Join Five Eyes 5G Ban On Huawei/ZTE

Making it official. Canada is to turn its unofficial ban on 5G kit from Huawei…

2 days ago

Twitter To Hide Tweets That Share False Information During A Crisis

Potentially risking Elon's wrath over free speech, Twitter says it will hide tweets spreading misinformation…

3 days ago

Boeing Starliner Test Flight Readied For Tonight

Third time the charm? Main rival to SpaceX's Dragon capsule, the embattled Boeing Starliner spacecraft,…

3 days ago

September 13 Slated For iPhone 14 Launch – Report

No surprise there. Apple is slated to launch the iPhone 14 on 13 September according…

3 days ago