MESA: In-memory Spatial Analytics Made Scalable

Introduction

We live in the era of big and complex data. Companies, organizations, and individuals routinely collect huge volumes of data of rich nature, including measurements and locations, which are produced by sensing and GPS devices. At the same time, main memories become larger and cheaper, machines become equipped with faster, multi-core processors, and new computational models that facilitate big data management and analytics become available.

In this project, we focus on the effective management of big spatial data, which are available at different forms and volumes, in order to support efficient spatial querying and analytics. Geographic data (e.g., maps) include relatively static spatial objects of varying complexity (e.g., small to medium sized polygons and polylines). Location-based services use maps and also handle huge volumes of continuously produced, dynamic location data (e.g., mobile user positions and their influence/contact regions). Hence, a modern big spatial data system should be able to manage at the same time different object collections of varying types and volatility.

The main goal of this project is to advance the state-of-the-art in big spatial data management, by designing, developing and evaluating novel data partitioning schemes, spatial indices, and algorithms, which (i) exploit the capabilities of modern hardware and (ii) use parallel and/or distributed data management paradigms to scale up out the evaluation of (analytical or transactional) spatial queries.

Project Objectives

O1. Design and evaluate novel partitioning schemes for spatial data, such that the resulting partitions are independent and spatial analysis does not require any synchronization or communication between the processes that process each partition.

O2. Develop novel in-memory spatial indices for each partition to efficiently support the most common spatial query types, which are building blocks in spatial analysis, including range queries, nearest- neighbor queries, spatial intersection joins, and distance joins.

O3. Propose solutions for the effective management of both relatively static and highly dynamic spatial data.

O4. Handle different types of spatial data (e.g., points, polylines, polygons) with customized parti- tioning, indexing, and query evaluation techniques.

O5. Exploit as much as possible the capabilities of modern hardware, especially multi-core parallelism.

O6. Integrate the developed components (partitioning, indexing, handling of different data types, parallelization) into a unified spatial data management system.

Funding

The research project is supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the "2nd Call for H.F.R.I. Research Projects to support Faculty Members & Researchers" (Project Number: 2757).

Timeframe: March 2022 - March 2025

Team

Nikos Mamoulis (PI)
Thanasis Georgiadis (PhD student)
Achilleas Michalopoulos (PhD student)

Publications

  1. T. Georgiadis and N. Mamoulis, "Raster Intervals: An Approximation Technique for Polygon Intersection Joins," Proceedings of the ACM Conference on Management of Data (SIGMOD), Seattle, WA, June 2023.