MESA: In-memory Spatial Analytics Made Scalable

Introduction

We live in the era of big and complex data. Companies, organizations, and individuals routinely collect huge volumes of data of rich nature, including measurements and locations, which are produced by sensing and GPS devices. At the same time, main memories become larger and cheaper, machines become equipped with faster, multi-core processors, and new computational models that facilitate big data management and analytics become available.

In this project, we focus on the effective management of big spatial data, which are available at different forms and volumes, in order to support efficient spatial querying and analytics. Geographic data (e.g., maps) include relatively static spatial objects of varying complexity (e.g., small to medium sized polygons and polylines). Location-based services use maps and also handle huge volumes of continuously produced, dynamic location data (e.g., mobile user positions and their influence/contact regions). Hence, a modern big spatial data system should be able to manage at the same time different object collections of varying types and volatility.

The main goal of this project is to advance the state-of-the-art in big spatial data management, by designing, developing and evaluating novel data partitioning schemes, spatial indices, and algorithms, which (i) exploit the capabilities of modern hardware and (ii) use parallel and/or distributed data management paradigms to scale up out the evaluation of (analytical or transactional) spatial queries.

Project Objectives

O1. Design and evaluate novel partitioning schemes for spatial data, such that the resulting partitions are independent and spatial analysis does not require any synchronization or communication between the processes that process each partition.

O2. Develop novel in-memory spatial indices for each partition to efficiently support the most common spatial query types, which are building blocks in spatial analysis, including range queries, nearest- neighbor queries, spatial intersection joins, and distance joins.

O3. Propose solutions for the effective management of both relatively static and highly dynamic spatial data.

O4. Handle different types of spatial data (e.g., points, polylines, polygons) with customized partitioning, indexing, and query evaluation techniques.

O5. Exploit as much as possible the capabilities of modern hardware, especially multi-core parallelism.

O6. Integrate the developed components (partitioning, indexing, handling of different data types, parallelization) into a unified spatial data management system.

Funding

The research project is supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the "2nd Call for H.F.R.I. Research Projects to support Faculty Members & Researchers" (Project Number: 2757).

Timeframe: March 2022 - March 2025

Team

Nikos Mamoulis (PI)
Thanasis Georgiadis (PhD student)
Achilleas Michalopoulos (PhD student)
Dimitrios Tsitsigkos (PhD student)

Publications

  1. G. Christodoulou, P. Bouros and N. Mamoulis, "LIT: Lightning-fast In-memory Temporal Indexing," Proceedings of the ACM Conference on Management of Data (SIGMOD), Santiago de Chile, June 2024.
  2. P. Bouros, A. Titkov, G. Christodoulou, C. Rauch, and N. Mamoulis, "HINT on Steroids: Batch Query Processing for Interval Data," Proceedings of the 27th International Conference on Extending Database Technology (EDBT), Paestum, Italy, March 2024 (short paper).
  3. A. Michalopoulos, K. Lampropoulos, G. Kelantonakis, C. Zeginis, K. Magoutis, and N. Mamoulis, "Similarity Search based on Geo-footprints," Proceedings of the 27th International Conference on Extending Database Technology (EDBT), Paestum, Italy, March 2024 (short paper).
  4. D. Tsitsigkos, P. Bouros, K. Lampropoulos, N. Mamoulis, and M. Terrovitis, "Two-layer Space-oriented Partitioning for Non-point Data," IEEE Transactions on Knowledge and Data Engineering (TKDE), 36(3): 1341-1355, March 2024.
  5. G. Christodoulou, P. Bouros, and N. Mamoulis, "HINT: A Hierarchical Interval Index for Allen Relationships," The VLDB Journal, 33(1): 73-100, January 2024.
  6. K. Lampropoulos, F. Zardbani, N. Mamoulis, and P. Karras, "Adaptive Indexing in High-Dimensional Metric Spaces," Proceedings of the VLDB Endowment (PVLDB), 16(10): 2525-2537, 2023.
  7. F. Zardbani, N. Mamoulis, S. Idreos, and P. Karras, "Adaptive Indexing of Objects with Spatial Extent," Proceedings of the VLDB Endowment (PVLDB), 16(9): 2248-2260, 2023.
  8. L. Qiu, G. Kellaris, N. Mamoulis, K. Nissim, and G. Kollios, "Doquet: Differentially Oblivious Range and Join Queries with Private Data Structures," Proceedings of the VLDB Endowment (PVLDB), 16(13): 4160-4173, 2023.
  9. M. D. Siampou, G. Papadakis, N. Mamoulis, and M. Koubarakis, "Supervised Scheduling for Geospatial Interlinking," Proceedings of the International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), Hamburg, Germany, November 2023.
  10. A. Michalopoulos, D. Tsitsigkos, P. Bouros, N. Mamoulis, and M. Terrovitis, "Efficient Nearest Neighbor Queries on Non-point Data," Proceedings of the International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL), Hamburg, Germany, November 2023.
  11. Thanasis Georgiadis, Eleni Tzirita Zacharatou, and Nikos Mamoulis, "APRIL: Approximating Polygons as Raster Interval Lists," CoRR abs/2307.01716, July 2023.
  12. T. Georgiadis and N. Mamoulis, "Raster Intervals: An Approximation Technique for Polygon Intersection Joins," Proceedings of the ACM Conference on Management of Data (SIGMOD), Seattle, WA, June 2023.