Cray Adds Intel Hadoop To Supercomputer Clusters

fast elephant hadoop real time big data © shutterstock 1971yes

Cray will use Intel’s Hadoop distribution for its turnkey big data computing infrastructure

Cray officials are adding Intel’s Hadoop distribution to their growing list of supercomputing solutions for the burgeoning big data market.

Cray later this month will launch cluster supercomputers for Hadoop applications that will combine the vendor’s CS300 supercomputers with Intel’s Hadoop distribution, a Linux operating system and Cray’s Advanced Cluster Engine (ACE) management software, according to company officials.

Hadoop expansion

The result will be a turnkey computing infrastructure that will enable organisations to better leverage Hadoop, according to Bill Blake, senior vice president and CTO at Cray.

Hadoop Elephant“More and more organisations are expanding their usage of Hadoop software beyond just basic storage and reporting,” Blake said in a statement. “But while they’re developing increasingly complex algorithms and becoming more dependent on getting value out of Hadoop systems, they are also pushing the limits of their architectures.”

The Cray Hadoop supercomputer clusters, which will be integrated, optimised, validated and supported by the systems maker, will enable to scale their Hadoop software, he said.

“Organisations can now focus on scaling their use of platform-independent Hadoop software, while gaining the benefits of important underlying architectural advantages from Cray and Intel,” Blake said.

Big data is a growing trend in the business world, with massive amounts of data being created from the wide range of connected devices, machines and sensors.

Intel officials have said that every 11 seconds, a petabyte of data is created around the world.

Big data

Hadoop, which includes about a dozen open-source projects, is designed to enable businesses to more easily store huge amounts of data, analyse it and leverage it in ways that benefit both the organisations and their users. For example, businesses can use it to gain a better understanding of what their customers want, while medical researchers can more quickly discover life-saving drugs and communities can improve their environments by better managing traffic patterns.

Intel in February unveiled the Intel Distribution for Apache Hadoop, its own distribution of the open-source technology.

The giant chip maker had been working with Hadoop since 2009, but officials said it was important to offer a Hadoop distribution optimised to work with features on its processors, such as incorporating Advanced Encryption Standard New Instructions (AES-NI) for accelerating encryption into the Hadoop Distributed File System. It’s also part of a larger effort by Intel to grow its role in the data centre beyond server chips.

Intel has been building up its software capabilities via in-house development and acquisitions, and while keeping open parts of its Hadoop distribution – making them interoperable with other Hadoop distributions – the company will keep some features, including management and monitoring capabilities, to itself.

Intel will not open source such software as Intel Manager for Apache Hadoop – for configuration and deployment – or Active Tuner for Apache Hadoop, a tool for improving the performance of compute clusters running the distribution.

Cray officials, in announcing their new Hadoop clusters, noted the strengths in Intel’s distribution, including greater security, improved real-time handling of data, and enhanced performance throughout the storage architecture. Cray is including support for InfiniBand and improved resource management, officials said.

Appro technology

The CS300 series of supercomputers – which Cray inherited when it bought rival Appro for $25 million (£16m) in November 2012 – comes with an integrated high-performance computing (HPC) software stack and software tools that are compatible with most open-source and commercial compilers. That will enable organisations to leverage Intel’s Hadoop distribution, according to Girish Juneja, CTO and general manager of Intel’s Big Data Software unit.

“Combining these features with the highly innovative HPC technologies in Cray systems will create a compelling solution for organisations with the most demanding Hadoop requirements,” Juneja said in a statement.

Cray’s Hadoop supercomputer clusters, which offer energy-efficient water- or liquid-cooled architectures, are the latest move by the systems vendor to build out its portfolio of products for big data.

The company also offers Cray Sonexion storage systems and YardData’s Urika appliance for graph analytics.

Do you know all about Intel? Take our quiz.

Originally published on eWeek.