Amazon Web Services has introduced an affordable data warehouse in the cloud, dubbed Redshift
Amazon said its new petabyte-scale data warehouse service in the cloud provides a fast and powerful solution that increases the speed of query performance when users are analysing virtually any size data set, using the same SQL-based business intelligence tools they use today.
Cheap Data Warehouse
With just a few clicks in the Amazon Web Services (AWS) Management Console, customers can launch a Redshift cluster, starting with a few hundred gigabytes and scaling to a petabyte or more, for less than $1,000 (£624) per terabyte per year – one tenth the price of most data warehousing solutions available to customers today, AWS officials said.
AWS announced Redshift at its re:Invent user conference in Las Vegas 28 November. The re:Invent conference is AWS’ first-ever user conference. The event, which runs 27 to 29 November, is a gathering of customers, partners, developers and others who make up the AWS ecosystem.
“Over the past two years, one of the most frequent requests we’ve heard from customers is for AWS to build a data warehouse service,” Raju Gulabani, vice president of database services at AWS, said in a statement. “Enterprises are tired of paying such high prices for their data warehouses, and smaller companies can’t afford to analyse the vast amount of data they collect – often throwing away 95 percent of their data. This frustrates customers as they know the cloud has made it easier and less expensive than ever to collect, store, and analyse data.
“Amazon Redshift not only significantly lowers the cost of a data warehouse, but also makes it easy to analyse large amounts of data very quickly,” Gulabani said. “While actual performance will vary based on each customers’ specific query requirements, our internal tests have shown over 10 times performance improvement when compared to standard relational data warehouses. Having the ability to quickly analyse petabytes of data at a low cost changes the game for our customers.”
AWS officials noted that self-managed, on-premise data warehouses require significant time and resource to administer, especially for large data sets. Loading, monitoring, tuning, taking backups and recovering from faults are complex and time-consuming tasks. And the financial cost associated with building, maintaining, and growing traditional data warehouses is high.
However, larger companies have resigned themselves to paying such a high cost for data warehousing, while smaller companies often find the hardware and software costs prohibitively expensive, leaving most of these organisations without a data warehousing capability. Amazon Redshift is aimed at all these users, the company said. Amazon Redshift manages all the work needed to set up, operate and scale a data warehouse, from provisioning capacity to monitoring and backing up the cluster, to applying patches and upgrades. Scaling a cluster to improve performance or increase capacity on Amazon Redshift is simple and incurs no downtime, while the service continuously monitors the health of the cluster and automatically replaces any component needed.
Amazon Redshift is also priced cost effectively to enable larger companies to reduce their costs substantially and smaller companies to take advantage of the analytic insights that come from using a powerful data warehouse.
Amazon Redshift uses a number of techniques, including columnar data storage, advanced compression, and high-performance IO and network, to achieve significantly higher performance than traditional databases for data warehousing and analytics workloads. By distributing and parallelising queries across a cluster of inexpensive nodes, Amazon Redshift makes it easy to obtain high performance without requiring customers to hand-tune queries, maintain indices or pre-compute results.
In addition, Amazon Redshift is certified by popular business intelligence tools, including Jaspersoft and MicroStrategy. More than 20 customers, including Flipboard, NASA/JPL, Netflix and Schumacher Group, are in the Amazon Redshift private beta program.
Amazon demonstrated that it uses its own technology internally to run its retail operation.
“The Amazon Enterprise Data Warehouse manages petabytes of data for every group at Amazon,” said Erik Selberg, manager of the Amazon.com data warehouse team. “We are seeing significant performance improvements leveraging Amazon Redshift over our current multimillion dollar data warehouse.
“Some multi-hour queries finish in under an hour, and some queries that took five to 10 minutes on our current data warehouse are now returning in seconds with Amazon Redshift. Early estimates show the cost of Amazon Redshift will be well under one-tenth the cost of our existing solution. Amazon Redshift is providing us with a cost-effective way to scale with our growing data analysis needs,” he said.
Amazon Redshift includes technology components licensed from ParAccel and is available with two underlying node types, including either 2 terabytes or 16 terabytes of compressed customer data per node. One cluster can scale up to 100 nodes and on-demand pricing starts at just $0.85 (£0.53) per hour for a 2-terabyte data warehouse, scaling linearly up to a petabyte and more. Reserved-instance pricing lowers the effective price to $0.228 (£0.14) per hour or under $1,000 (£624) per terabyte per year, the company said.
How well do you know the cloud? Take our quiz!
Originally published on eWeek.