EMC To Acquire Data Specialist Greenplum

Data storage specialist EMC plans to acquire parallel processing specialist Greenplum

Storage giant EMC revealed on 6 July that it is acquiring privately held Greenplum, which provides next-generation database warehousing software and self-service, cloud-based analytics for enterprises.

Terms of the all-cash transaction were not disclosed, but EMC did say it expects the deal to close in September.

Greenplum, the main market of which is enterprises with large amounts of data to store in cloud deployments, will form the foundation of a new data computing product division within EMC’s Information Infrastructure business, Chuck Hollis, an EMC vice president and the global marketing CTO, told eWEEK.

Greenplum’s MPP (massively parallel processing) SG Streaming (Scatter/Gather Streaming) “secret sauce” is designed to eliminate the bottlenecks associated with other approaches to data loading.

The company follows a parallel-everywhere approach to loading, in which data flows from one or more source systems to every node of the database.

10 To 100 Times Performance

Greenplum’s software is capable of delivering 10 to 100 times the performance of traditional database software, EMC said. Data-driven businesses that include NASDAQ OMX, NYSE Euronext, Skype, Equifax, T-Mobile and Fox Interactive Media currently use Greenplum for its cloud-based high-performance data analytics service.

Greenplum is different from traditional bulk loading technologies used by most mainstream database and MPP appliance vendors that push data from a single source, often over a single or small number of parallel channels.

The aforementioned situation can—and often does—result in bottlenecks and lengthier load times.

”There’s always a bottleneck in those data warehouses, whether it’s in the database, the servers, or the storage,” analyst Brian Babineau of Enterprise Strategy Group told eWEEK. “Everybody tries to solve those bottlenecks in a different way. And it’s easy to blame the storage, because disk drives tend to be the slowest part of the bottleneck.

”

The reality is that EMC does not want to give up that business [storage and database optimization software] to the likes of Oracle or other folks because it’s just a storage player. Now they have Greenplum, ideally suited for x86 environments, and which distributes workloads very well among shared storage resources.

“

Greenplum, which works only on x86 open systems, fits right into EMC’s overall “big data” plans, Babineau said.

”The second angle here is that EMC has deployments on the back ends of a lot of data warehousing systems,” Babineau said.

Change Data Warehousing

Greenplum has challenged established vendors such as Oracle, Teradata and Netezza, and has been successful in only seven years of existence.

”The data warehousing world is about to change,” said Pat Gelsinger, president and chief operating officer of EMC Information Infrastructure Products. “Greenplum’s massively parallel, scale-out architecture, along with its self-service consumption model, has enabled it to separate itself from the incumbent players and emerge as the leader in this industry shift toward ‘big data’ analytics.”

In acquiring Greenplum, EMC saw an opportunity for the storage market to evolve, EMC’s Hollis told eWEEK.

”Put it all together: big data, billions of records, the new mandate to make real-time analytics a weapon, the advent of fully virtualized environments, self-serve analytics and people who are good knowledge workers,” Hollis said.

“This is not about doing what was done previously, better. This is about entirely new use cases for big data. We’re betting on the future rather than trying to monetize the past.”

’Good synergy’ developed over time

.

The two companies kept running into each other in various deployments during the last two years or so, and eventually a good synergy developed, Greenplum co-founder and President Scott Yara told eWEEK.

“The alignment was so close in a number of ways: in terms of how we viewed the importance of data, the idea of moving processing closer to where the data lives, and the role that virtualization and private cloud computing is going to play in data analytics,” Yara said. “The idea came that maybe we should join forces. We decided that it was either going to happen very quickly, or that we would just keep going, because it was going very well.”

Greenplum employs about 140 people in the San Francisco Bay Area.

”We believe so much in this idea [of moving processing and data closer together for performance efficiency], that Greenplum will be the nucleus of a whole new EMC products group,” Hollis said. “Much like the way Data Domain [2009] and RSA [2006] came in, when we built entire product divisions around them, we’re going to ask the Greenplum leadership team to do the exact same thing for us.”

Babineau said 2010 could be a breakout year in data warehousing.

 “This is a very interesting space,” Babineau said. “The two biggest companies in it, Teradata and Netezza, are totaling about $2 billion in trailing 12-month revenue … Teradata about $1.7 billion and Netezza about $203 million.

”There is clearly a lot of money being spent in this area, and EMC wants its fair share of this stuff.”

Editor’s Note: eWEEK Senior Writer Brian Prince contributed to this report.