Categories: Security

Hadoop Databases Expose 5 Petabytes Of Data To The Internet

Unsecure Hadoop databases are exposing a massive 5 petabytes (PB) of data to the Internet, putting it at risk from ransom attacks, according to a researcher.

The findings follow a spate of ransom attacks that began in January, when hackers discovered they could steal exposed data and demand payment for its return.

Exposed data

Those attacks affected tens of thousands of databases and most focused on MongoDB, as well as Elastic and Redis instances, due to their popularity.

But John Matherly, creator of the Shodan search engine, said that while fewer Hadoop instances are exposed, the amount of data those databases contain is far greater than that found on MongoDB.


Shodan found only about 4,487 exposed databases using Hadoop’s HDFS file system, about one-tenth of the number of MongoDB instances – 47,820.

But those Hadoop instances expose more than 200 times the amount of data found on the MongoDB instances, at 5,120 terabytes (or 5.1 PB) compared to 25 TB, Matherly said.

“In terms of data volume it turns out that HDFS is the real juggernaut,” he wrote in a blog post.

No authentication

The findings are consistent with figures that predate the ransom attacks, with Binary Edge finding in 2015 that Redis, MongoDB, Memcached and ElasticSearch database instances together only exposed about 1.1 PB of data.

The ransom attacks initially focused on the more numerous servers as hackers looked to amass a large number of ransom payments, with different groups competing to extort payments from the same compromised server, researchers said.

They later moved on to hit hundreds of Hadoop databases as well.

Matherly found the disparity continues today, with “most” of the MongoDB instances appearing to have been compromised, while ransom notes were found on only 207 Hadoop clusters.

Most of the Hadoop instances are located in the US (1,900) and China (1,426), with nearly all being hosted in the cloud – the top providers being Amazon, which hosts 1,059 of the databases, and Alibaba, which hosts 507.

The exposed servers are vulnerable because, due to misconfiguration or other issues, they’re accessible from the Internet without any authentication enabled, Matherly said.

Shodan is better known for its use in locating unsecured Internet-connected devices such as webcams, routers and set-top boxes.

The large numbers of such devices poses a security risk, since they can be hijacked and used to carry out disruptive denial-of-service attacks.

How well do you know the cloud? Try our quiz!

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Google, DOJ Closing Arguments Clash Over Search ‘Monopoly’

Google clashes with US Justice Department in closing arguments as government argues Google used illegal…

6 hours ago

Stanford AI Scientist Working On ‘Spatial Intelligence’ Start-Up

Prominent Stanford University AI scientist Fei-Fei Li reportedly completes funding round for start-up based on…

6 hours ago

Apple Shares Surge Ahead Of New AI Hardware Launches

Apple shares surge on optimism that new AI-focused hardware launches will drive renewed sales, starting…

7 hours ago

Biden Vetoes Republican Measure In Row Over Contractors’ Unions

Biden vetoes Republican-backed measure amidst dispute over 'joint employer' status for contract workers, affecting tech…

7 hours ago

Lawyers Say Strict Child Controls In China Show TikTok Could Do Better

Lawyers in US social media addiction action say strict controls on Douyin in China show…

8 hours ago

London Black Cabs Sue Uber In Latest Legal Tangle

More than 10,000 London black cab drivers sue Uber claiming company acted illegally to obtain…

8 hours ago