Principles of Graph Data Integration
(GraphInt)
Start date: Aug 1, 2016,
End date: Jul 31, 2021
PROJECT
FINISHED
The present proposal tackles fundamental problems in data management, leveraging expressive, large-scale and heterogeneous graph structures in order to integrate both unstructured (e.g., text) and structured (e.g., relational) content. Integrating heterogeneous content has become a key hurdle in the deployment of Big Data applications, due to the meteoric rise of both machine and user-generated data storing information in a variety of formats. Traditional integration techniques cleaning up, fusing and then mapping heterogeneous data onto rigid abstractions fall short of accurately capturing the complexity and wild heterogeneity of today’s information. Having closely followed the emergence of heterogeneous information sources online, I am convinced that only an interdisciplinary approach drawing both from classical data management and from large-scale Web information processing techniques can solve the formidable data integration challenges that they pose. The following project proposes an ambitious overhaul of information integration techniques embracing the scale and heterogeneity of today’s data. I propose the use of expressive and heterogeneous graphs of entities to continuously and dynamically interrelate disparate pieces of content while capturing their idiosyncrasies. The following project focuses on three core issues related to large-scale and heterogeneous information graphs: i) the effective extraction of fined-grained information from unstructured sources and their proper integration into large-scale heterogeneous and probabilistic graphs, ii) the creation of novel physical storage structures and primitives to durably and efficiently manage the profusion of data considered by such graphs using clusters of commodity machines, and iii) the development of logical data abstraction mechanisms facilitating the effective and efficient resolution of complex analytic and data integration queries on top of the physical layer.
Get Access to the 1st Network for European Cooperation
Log In