SAS Adds Hadoop Support For Big Data Customers

SAS has increased access to big data sources for its customers after adding Hadoop integration

Analytics software provider SAS has given its customers increased access to big data sources after it added Hadoop support in its updated SAS Enterprise Data Integration Server.

By employing the popular open-source data architecture, customers using analytics from SAS can increase the value of big data assets.

Big Data

Hadoop joins more than three dozen supported data sources in SAS Enterprise Data Integration Server, including Oracle, DB2, SQL Server, Teradata (including Teradata Aster), Sybase, Netezza, EMC Greenplum and MySQL. SAS support for Hadoop access is a key requirement for many organisations that are adding Hadoop to their environment.

Sponsored by the Apache Software Foundation, Hadoop is an open-source Java-based framework for processing large data sets in a distributed computing environment. SAS integrates with the Apache Hadoop distribution.

SAS’ deep integration with Hadoop applies the parallelism of MapReduce, the distributed computing framework commonly associated with Hadoop, the company said. SAS, Hadoop and data warehouse infrastructure Hive match well in analysing large data sets, simplifying the most common big data analysis and analytic use cases, the company said.

The SAS Hadoop integration means SAS “write-once, run-anywhere” extends to Hadoop deployments. Also, SAS features – such as job flow builder, visual editor, syntax checker and others – are extended to Hive, Pig, MapReduce and Hadoop Distributed File System (HDFS) commands. In addition, SAS augments native Hadoop security with SAS data security provisions, including authorization and data lineage. And SAS supports popular Hadoop distributions, such as Cloudera, HortonWorks and EMC Greenplum.

Moreover, SAS data quality and profiling cover data moving in or out of Hadoop. SAS access extends SAS capabilities, such as visual analytics explorer, text mining and analytics to Hadoop data. And Hadoop data can be federated along with data from other sources, including the ability to embed the federated query in a data management job flow.

Growing Importance

Hadoop is becoming more important as more organisations evaluate its capabilities and plan for increased deployment,” said Jim Davis, senior vice president and chief marketing officer at SAS, in a statement. “Bringing powerful SAS Analytics to Hadoop takes advantage of its distributed processing capabilities and helps effectively manage Hadoop deployments.

“Hadoop lacks good tools to develop and manage complex deployments. SAS’ extensive data and analytics management software helps enterprises pull value from Hadoop deployments using minimal resources,” added Davis.

“Hadoop’s value is in taking very large data collections – from simple, regular data to complex, unstructured data – and [processing] it quickly,” Carl Olofson, IDC research vice president for application development and deployment, said in a statement. “IDC expects commercial use of Hadoop to accelerate as more established enterprise software providers such as SAS make Hadoop accessible and easy to use.”

Meanwhile, SAS Information Management will deliver greater support for big data, data governance, master data management and decision management this year. Advanced analytic enablement will grow as analytic processing increasingly moves into databases, SAS officials said.

“SAS Information Management enables customers to exploit and govern information assets, resulting in competitive differentiation and sustained business success,” said Mark Troester, a SAS IT/CIO strategist. “SAS Information Management uniquely integrates management of data, analytics and decision processes across the entire information continuum.”

SAS Information Management delivers data management – including data governance, data integration, data quality and master data management. It also delivers analytics management and decision management.