We invited Dr. Guha from Google to talk about google’s latest effort on information management based on knowledge graph.

Event details: link.


Abstract:

Statistical data is vital to understanding the world around us, especially for tackling the impact of crises ranging from the pandemic to climate change. Thus requires a deep understanding of not just the virus or the climate, but the various stresses that will impact access to food, shelter, healthcare and other aspects of life. Unfortunately, this data required for this understanding is fragmented across thousands of organizations, in different schemas and a multitude of databases, making it very expensive, if not impossible to use.

Google has ‘organized and made easily accessible’ many kinds of information — web pages, images, maps, etc. One of our contributions to tackling these crises is to organize this statistical data and make it easily accessible to consumers, journalists, policy makers and researchers through our Data Commons effort. The Data Commons approach is to do the data wrangling once and make the processed data widely available. It is a single unified knowledge graph, with over 250 billion data points from hundreds of sources, created by normalizing/aligning the schemas and entity references across data from these sources. The data is available via standard schemas and Cloud APIs and more recently, using developments in language models, via natural language.