Cloudera Adopts Apache Spark As MapReduce Big Data Alternative

Creator of Hadoop Doug Cutting says the use of MapReduce engine for Big Data projects will decline, replaced by Apache Spark

Hadoop-based Big Data start-up Cloudera has announced commercial support for a new technology – Apache Spark – while at the same time kicking off a re-branding of its products, designed to push confusingly-named technologies into the background.

Cloudera says Apache Spark will run more variable workloads and operate faster than the MapReduce engine, which has so far been at the heart of all Hadoop implementations. The company will continue to offer MapReduce, but customers can get support for Spark – yet another open source project in the constellation of efforts around Hadoop. Perhaps sensing a weariness with the number of Hadoop-rleated technologies, Cloudera has also kicked off an effort to brand its offering as an “Enterprise Data Hub”.

Moving Hadoop onto a new platform

“Spark has been in development for four years, and is ready for use by the general public,” Doug Cutting, chief architect at Cloudera and creator of Hadoop, told TechWeek. “It’s easy to program, and it uses memory more efficiently, as not all intermediate data goes to disk.”

For many problems – for instance if there are repeated calculations – Spark would perform much faster. “But it doesn’t make MapReduce obsolete overnight,” said Cutting. “Over time, fewer projects will use MapReduce, and more will use Spark.”

Projects already started will carry on with MapReduce, and new projects which make “one-shot” calculations may find MapReduce better: “That is a large class of problems,” said Cutting.

Stone Age software?

MapReduce has come under criticism from older software players. “It’s Stone Age software ” said Steve Shine, CEO of open source giant Actian on a visit to London earlier this month. As the home of the venerable Ingres relational database, Actian has its own Big Data offering, and Shine’s criticism extended to the Hadoop ecosystem and the vendors within it, describing their stacks as “too brittle and fragile”.

Cutting has no time for criticism of the maturity of Hadoop: “Vendors would not be wise to rely on this for too long,” he said. “Established data warehouse vendors have products that are 20-30 years old. They have been polishing them for that long and that is worth a lot. But Hadoop is catching up pretty quickly.” People are moving projects from the old ETL (extract transform and load) data warehouse world to a more flexible Hadoop approach, he said.

But this sort of criticism of Hadoop as an esoteric set of shifting, related projects may be behind Cloudera’s new branding as a provider of the “Enterprise Data Hub”. Cutting said: “It’s not just a storage system and a MapReduce engine,” he said. “It’s a much more general platform architecture.”

Another benefit might be that EDH gets Cloudera into the boring dependable-sounding world of three-letter acronyms, but Cutting says “it’s not just a change in marketing speak, it helps people to understand what we are talking about.”

It’s also impervious to any future changes in the underlying technology.

Our Big Data Quiz is the same size as all our others!

Read also : Silicon In Focus Podcast: Feeding the Machine