VMware Helps Java Developers Handle Big Data With Spring Hadoop

VMware’s Spring Hadoop targets enterprise Java developers who want to create Big Data analysis applications

VMware has introduced Spring Hadoop, a new project that will make it easier for enterprise Java developers to use the familiar Spring Framework to build products around the Apache Hadoop platform.

This is the latest addition to the VMware Spring Data family of projects and integrates the Spring Framework and the Apache Hadoop platform. Spring Hadoop provides support for writing Apache Hadoop applications that can access the features of Spring, Spring Batch and Spring Integration. VMware introduced the Spring Hadoop project at the O’Reilly Strata Conference in California.

Streamlined model

“VMware is committed to helping developers build, deploy, manage and scale the new wave of data-driven applications,” said Adrian Colyer, CTO for Cloud and Application Services at VMware, in a statement. “By building upon Spring’s strong and versatile foundation of simplifying data access, and leveraging the depth of the Hadoop platform, VMware is delivering a streamlined programming model that makes Spring the natural way to integrate Hadoop systems into the enterprise application landscape.”

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free licence. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

Enterprises interested in Hadoop have noted the need for tools to make dealing with Hadoop easier for developers as well as business users. The VMware move with Spring Hadoop is aimed at helping with the former, particularly for enterprise Java developers.

Spring Hadoop brings the benefits of Spring – simplicity and ease-of-use – to Hadoop by providing a comprehensive, lightweight framework that will allow developers to easily build products around the Hadoop platform, VMware officials said. As data volumes and data access choices in enterprise applications have grown exponentially, Spring continues to focus on enabling enterprise Java developers to incorporate new data access patterns into their applications through the Spring Data projects.

In a blog post, Costin Leau, a staff engineer in the SpringSource unit of VMware said:

“Part of the Spring Data umbrella, Spring Hadoop provides support for developing applications based on Hadoop technologies by leveraging the capabilities of the Spring ecosystem. Whether one is writing standalone, vanilla MapReduce applications, interacting with data from multiple data stores across the enterprise, or coordinating a complex workflow of HDFS, Pig, or Hive jobs, or anything in between, Spring Hadoop stays true to the Spring philosophy offering a simplified programming model and addresses ‘accidental complexity’ caused by the infrastructure. Spring Hadoop, provides a powerful tool in the developer arsenal for dealing with big data volumes.”

Spring Hadoop is free to download and available now under the open source Apache 2.0 licence. As indicated by Leau, key aspects of Spring Hadoop include:

  • Support for configuration, creation, and execution of MapReduce, Streaming, Hive, Pig, and Cascading jobs via the Spring container
  • Comprehensive HDFS data access support through JVM scripting languages (Groovy, JRuby, Jython, Rhino, etc.)
  • Declarative configuration support for HBase
  • Dedicated Spring Batch support for developing powerful workflow products incorporating HDFS operations and all types of Hadoop jobs
  • Support for use with Spring Integration that provides easy access to a wide range of existing systems using an extensible event-driven pipes and filters architecture
  • Powerful Hadoop configuration options and templating mechanism for client connections to Hadoop
  • Declarative and programmatic support for Hadoop Tools, including FsShell and DistCp

How well do you know the cloud? Take our quiz and find out!