PLOIGIA: Navigation and Visual Analytics on Raw Datasets

The present research project, PLOIGIA, concerns the development and implementation of innovative methods to support interactive navigation and visual analytical processing of large volumes of multidimensional scientific data, directly on the raw data and without the support of database management systems.

The main objective of the project is to develop innovative methods for the interactive exploration of large files of multidimensional data in a way that (a) allows the provision of basic data navigation functions, (b) supports knowledge extraction with visualizations and automated analyzes of the data, without (c) the support of a database management system but working on the source raw data.

The main research questions are:

Can we support a set of navigation operations in the data, such as drilling into details, filtering information, aggregating where necessary, or comparing with similar data, in an efficient way, without using a database management system and without altering the raw data in their source file format?
Can we support navigation with visualizations and analytical processing of data such as presenting their standard statistical profile, automatically locating anomalies, comparing with more general or particular subsets of interest data, creating areas of the navigation site that appear to are of interest to the user, in a way that is (a) progressively expandable and customizable to the needs of the user and (b) efficient?

The core of the proposal is a new indexing structure based on the following main features: (a) lightweight footprint, (b) incremental and adaptive construction, and (c) support of navigation transitions, visualization and analytical data processing.

The structure that is proposed is a multidimensional tile-based indexing structure. The structure, which is primarily based and maintained in the main memory, groups data objects into tiles, depending on the values of their attributes. For each data entity, the structure saves the file location where it is located.

A key feature of the proposed indexing structure is the support of incremental and adaptive construction. Following this practice, the structure is constructed gradually following the user interaction. Therefore, at any point in time, the part of the structure that is necessary to accommodate the user's actions is constructed. The logic of this practice is based on a typical characteristic of data exploration, in which much of the data is not considered by the user. Therefore, the detailed indexing and processing of the entire data set very often ends up to be futile.

The proposed indexing structure uses cache techniques to preserve frequently used data. Additionally, a second part stores information about the data previously visualized on the user screen. In this way, in subsequent actions of the user, only data that is not already visualized are retrieved and visualized, resulting in a reduction in both the recovery time and the visualization time.

Combined with recovery and cache techniques, our indexing structure aims to efficiently support interactive user functions. Supported operations in a hierarchical data exploration and interactive visualization scenario are operations like filter, drill-down, and roll-up that simulate OLAP transitions (but without necessarily applying aggregation). In addition, the statistical properties of the processed data set are visualized in an efficient manner (for example, the presentation of the statistical profile of the data, or, the automatic detection of anomalies by comparison with more general or specific subsets of interest data) and result in the recommendation of regions of the data space that appear to be of interest to the user on the basis of these comparisons.