Big DataData Storage

IBM Invests Heavily In ‘Important’ Open Source Apache Spark Big Data Project

Steve McCaskill is editor of TechWeekEurope and ChannelBiz. He joined as a reporter in 2011 and covers all areas of IT, with a particular interest in telecommunications, mobile and networking, along with sports technology.

Follow on: Google +

IBM pledges technology and 3,500 developers to Apache Spark, which will be used by its analytics and commerce platforms as well as Watson Health Cloud

IBM has pledged significant resources and up to 3,500 researchers to the Apache Spark big data platform, which the company calls “the most important new open source project in a decade that is being defined by data.”

Spark was originally developed in 2009 at the AMPLab at UC Berkeley, of which IBM is a founding partner, and has gained popularity because of its perceived ease of use and efficient memory management.

Its supporters claim Spark is 100 times faster at analysing data in memory using Hadoop’s MapReduce and ten times faster than disk. Spark had 465 contributors as of 2014, making it the most active project in the Apache Software Foundation and open source Big Data project.

IBM Spark

IBM WatsonIBM says this commitment from the open source community means Spark is in a constant state of improvement and wants to aid the development of the platform with its own contributions.

“IBM has been a decades long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, general manager of IBM’s analytics platform. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”

Spark is to be built into IBM’s analytics and commerce platforms and the company will offer Spark as a cloud service through BlueMix. Watson Health Cloud will use the engine to help medical researchers analyse population health data and IBM’s own SystemML machine learning technology is to be open sourced to aid Spark’s development.

Up to 3,500 researchers and developers will work on Spark-related projects across the world and IBM has committed to educating more than one million data scientists and data engineers about the platform.

High profile users of Spark include NASA and the SETI Institute, which are analysing terabytes of deep complex radio signals to see if there is evidence of extra-terrestrial life.

Our Big Data Quiz is the same size as all our others!