Microsoft Azure Blasts Into Research Field

Microsoft shows Windows Azure running NCBI Blast medical search tool at the Supercomputing 2010 conference

Microsoft announced the release of the National Centre for Biotechnology Information Basic Local Alignment Search Tool (NCBI Blast) on Windows Azure at Supercomputing 2010.

The new application enables a broader community of scientists to combine desktop resources with the power of cloud computing for biological research. Microsoft showcased the scale of the application on Windows Azure, demonstrating its use for 100 billion comparisons of protein sequences in a database managed by the NCBI.

Search Engine Zips Through Data

NCBI Blast on Azure enables researchers to take advantage of the scalability of the platform to perform analysis of vast proteomics and genomic data in the cloud.

Blast is a suite of programs that is designed to search all available sequence databases for similarities between a protein or DNA query and known sequences. It allows quick matching of near and distant sequence relationships, providing a scoreboard that allows the user to distinguish real matches from background hits with a high degree of statistical accuracy.

The power of the Blast suite can be harnessed by allowing researchers to rent processing time on the Azure cloud platform, Microsoft said. The availability of these programs over the cloud allows laboratories, or even individuals, to have large-scale computational resources at their disposal at a very low cost per run, the company said. For researchers who do not have access to large computer resources, this greatly increases the options to analyse their data. They can now undertake more complex analyses or try different approaches that were simply not feasible before.

“NCBI Blast on Windows Azure gives all research organisations the same computing resources that traditionally only the largest labs have been able to afford,” said Bob Muglia, president of the Server and Tools business at Microsoft, in a statement. “It shows how Windows Azure provides the genuine platform-as-a-service capabilities that technical computing applications need to extract insights from massive data, in order to help solve some of the world’s biggest challenges across science, business and government.”

Researchers in bioinformatics, energy, drug research and many other fields use Blast to sift through large databases, to help identify new animal species, improve drug effectiveness and produce biofuels, and many other purposes.

NCBI Blast on Windows Azure provides a user-friendly web interface and access to cloud computing for very large computations, as well as smaller-scale operations. The application will allow scientists to use, and collaborate with, their private data collections, as well as data hosted on Windows Azure, including NCBI public protein data collections and the results of Microsoft’s large protein comparison.

Free Access For Qualifying Research Teams

The NCBI Blast software is available from Microsoft at no cost and Windows Azure resources are available at no charge to many researchers through Microsoft’s Global Cloud Research Engagement Initiative.

In an interview with eWEEK, Kyril Faenov, general manager of HPC, who is leading the Technical Computing Group at Microsoft, said, “We expect a large number of bioinformatics researchers to take advantage of this.”

Faenov said researchers at Seattle Children’s Hospital were able to solve a six-year problem in one week using the cloud platform. According to a Microsoft Research article, at Seattle Children’s Hospital, researchers interested in protein interactions wanted to know more about the interrelationships of known protein sequences. Due to the sheer number of known proteins – nearly 10 million – this would have been a very difficult problem for even the most state-of-the art computer to solve.

When the researchers first approached the Microsoft Extreme Computing Group (XCG) to see if Azure Blast could help solve this problem, initial estimates indicated that it would take a single computer more than six years to find the results. By leveraging the power of the cloud, they could cut the computing time substantially.

The researchers were able to split millions of protein sequences into groups and distribute them to data centres in multiple countries (spanning two continents) for analysis. By using the cloud, the researchers obtained results in about a week. This has been the largest research project to date run on Windows Azure, Microsoft said.