C3.ai, a leading enterprise AI software provider for accelerating digital transformation, today announced the general public release of the C3.ai COVID-19 Data Lake™. The data lake is a unique, unified source of comprehensive COVID-19 data and knowledge graph that is structurally different from other COVID-19 data collections, enabling researchers to accelerate efforts to mitigate the spread of the pandemic.
C3.ai COVID-19 Data Lake Accelerates Researcher Productivity
The C3.ai COVID-19 Data Lake uniquely interconnects the elements of all the data sources into a single, unified federated data model that is immediately available for researchers to access through any utility that offers RESTful data access. Most importantly, the data lake pre-establishes essential links in those complex data sets so that researchers can easily navigate and explore all of the associations within and across the data sets through a knowledge graph and then apply advanced data science methods. By unifying the data sets, the C3.ai COVID-19 Data Lake helps researchers and developers generate insights faster and more easily than is possible with other data collections.
Other COVID-19 data collections are limited in that they only provide lists of URLs that link to individual data sets in different locations and in different formats, requiring extensive data wrangling and integration efforts to be useful. In addition, a few providers offer digital libraries, collections of data sources that are stored in one place, but the data are not pre-integrated nor federated.
The C3.ai COVID-19 Data Lake is also unique in its ability to provide analysis-ready data and accelerate researcher productivity. It solves the key challenge researchers face in working with disparate data sources – i.e., having to spend up to 90 percent of their time “wrangling” data into a usable form.
Connor Makowski, a research associate at MIT Center for Transportation & Logistics and project manager at MIT’s Computational and Visual Education (CAVE) Lab, received early access to the data lake. “I was expecting something similar to many of the other COVID data resources out there. Instead of getting a list of URLs or folders full of CSVs, this data lake provides a comprehensive interconnected data model. Previously disconnected data sources can easily be integrated – including time series data – with a single, simple API request.”
Makowski added, “This is incredibly valuable when time is critical, and breakthroughs can mean thousands of lives saved. This is why I created an open-source Python connector for the data lake to provide even more access to this information.”
”Having access to an integrated set of diverse COVID-19 data sources with a common data model can help accelerate analysis of critical supply chain issues in our work with FEMA and other agencies,” said Tim Russell, Research Engineer at the MIT Humanitarian Supply Chain Lab, MIT Center for Transportation & Logistics. “For example, as we look to understand the distribution and availability of COVID-19 testing equipment and materials – or the pandemic’s impact on freight flows throughout the country – the C3.ai COVID-19 Data Lake provides a valuable resource in unifying and simplifying access to the necessary data without having to waste time on finding, cleaning, and preparing the data for analysis.”
The C3.ai COVID-19 Data Lake, which includes data from a number of critical COVID data sources, is now publicly available at no cost to the global research community and is accessible at: https://c3.ai/covid
Amazon Web Services (AWS) is co-sponsor of the open data initiative and is providing cloud infrastructure services in support of this initiative.
C3.ai COVID-19 Data Lake data sets include:
- Johns Hopkins University: COVID-19 Data Repository
- The COVID Tracking Project
- MOBS Lab: COVID-19 Situation Report
- nCoV-2019 Data Working Group: Epidemiology Data
- European Centre for Disease Prevention and Control: Worldwide Situation Updates
- COVID-19 Open Research Dataset (CORD-19)
- National Center for Biotechnology Information Virus Database
- World Health Organization: Daily Situation Reports
- Milken Institute COVID-19 Treatment and Vaccine Tracker
- World Health Organization COVID-19 R&D
- The New York Times: COVID-19 Data in the United States
Additional datasets to be published May 15, 2020, will include:
- University of Montreal: COVID-19 Image Data Collection
- Carbon Health & Braid Health: COVID-19 Clinical Data Repository
- Kaiser Health News: US Hospital ICU Beds
- US Census Bureau: Demographic & Housing Estimates
- Apple: COVID-19 Mobility Trends
- Kaiser Family Foundation: Social Distancing Policies
- University of Washington – Institute for Health Metrics and Evaluation: COVID-19 Projections
- Data Science for COVID-19: South Korea Dataset
- Indian Ministry of Health & Family Welfare: COVID-19 India
- Sito del Dipartimento della Protezione Civile – Emergenza Coronavirus
- Environmental Protection Agency: US Air Quality
- The World Bank – Global Health Statistics
Powered by C3 AI Suite
The C3.ai COVID-19 Data Lake is made possible through the capabilities of the C3 AI™ Suite that enables access to continuously updated COVID-19 data that is normalized and ready for analysis, as well as tools to build, train, and deploy AI models.
Aggregated into a unified, federated image, the diverse structured and unstructured data are easily accessible to researchers via any utility that supports access through a RESTful API using common tools such as Python, R, Ex Machina, and Microsoft Power BI.
The global research and developer communities are invited to help expand the scale of the C3.ai COVID-19 Data Lake by enhancing its functionality, developing analytics and predictive models, and by contributing additional COVID-related data sets through crowdsourcing.
“It is our hope that by solving the problem of data aggregation, data integrity, secure accessibility, and connectivity, we will have made a positive, long-lasting contribution to addressing this unprecedented pandemic,” said Thomas M. Siebel, CEO of C3.ai. “If this is our chief accomplishment at C3.ai, it will have been well worth the effort.”
“Our goal is to make it easier for researchers, data scientists, and developers to build, train, and run custom machine learning models on massive amounts of COVID-19 data for greater and faster insights,” said Mike Clayville, Vice President, Worldwide Commercial Sales and Business Development, Amazon Web Services, Inc. “The COVID-19 Data Lake has the potential to globally impact research efforts and speed breakthroughs to come.”
This news follows the initial COVID-19 Data Lake announcement on April 1 and the March 26 launch of C3.ai Digital Transformation Institute (C3.ai DTI), a public-private research consortium dedicated to accelerating the application of artificial intelligence to speed the pace of digital transformation in business, government, and society. C3.ai DTI member institutions include UC Berkeley, University of Illinois at Urbana-Champaign, MIT, Princeton University, Carnegie Mellon University, and the University of Chicago.
Join Live Virtual Conference for Demonstration
C3.ai will conduct a virtual conference with a live demonstration of the COVID-19 Data Lake on Wednesday, April 29, at 9 a.m. Pacific Time.
For additional information about C3.ai COVID-19 Data Lake please visit: https://c3.ai/covid
To learn more about C3.ai DTI’s program, award opportunities, and the first call for research proposals, focusing on AI techniques to mitigate COVID-19 and future pandemics, please visit C3DTI.ai.
C3.ai is a leading AI software provider for accelerating digital transformation. C3.ai delivers the C3 AI Suite for developing, deploying, and operating large-scale AI, predictive analytics, and IoT applications in addition to an increasingly broad portfolio of turn-key AI applications. The core of the C3.ai offering is a revolutionary, model-driven AI architecture that dramatically enhances data science and application development. Learn more at: www.c3.ai.