Schema Evolution and Schema Biographies: Studying the lives of schemata

 

A very large study on schema evolution with taxa of schema evolution

In a paper in ICDE 2021, we present the findings of a large study of the evolution of the schema of 195 Free Open Source Software projects. We identify families of evolutionary behaviors, or taxa, in FOSS projects. A major finding has been that the absence of schema evolution is more prevalent than its presence: a large percentage of the projects demonstrate very few, if any, actions of schema evolution. Two other taxa involve the evolution via focused actions, with either a single focused maintenance action, or a large percentage of evolution activity grouped in no more than a couple interventions. Schema evolution also involves moderate, and active evolution, with very different volumes of updates to the schema. To the best of our knowledge, this is the first study of this kind in the area of schema evolution, both in terms of presenting profiles of how schemata evolve, and, in terms of the dataset magnitude and the generalizability of the findings.

In a follow-up paper in Information Systems, 2022, we expand the aforementioned contributions. We investigate how the different taxa relate to measurable properties of schema evolution, specifically, duration of schema and project updates, activity volume, and heartbeat. We show that although different taxa have practically very similar duration, the evolutionary characteristics differ in analogy to the "active" character of each taxon. Moreover, by observing certain similarities in the measurable properties of the taxa, we take the opportunity to introduce super taxa, which complement the previous taxonomy with the groupings of the aforementioned taxa in terms of overall profile similarity, resulting in a more concise and intuitive taxonomy, providing a cleaner separation of evolution measures. Finally, we show that schema evolution is frequently, a time-concentrated activity.

Panos Vassiliadis, George Kalampokis. Taxa and super taxa of schema evolution and their relationship to activity, heartbeat and duration. Information Systems, Volume 110, 2022, 102109, ISSN 0306-4379, doi:10.1016/j.is.2022.102109.
[Official page at Springer] [Local folder of the paper]

Panos Vassiliadis. Profiles of Schema Evolution in Free Open Source Software Projects.37th IEEE International Conference on Data Engineering (ICDE '21), Chania, Crete, Greece, 19-22 April 2021.
Local page of the paper

A first attempt towards understanding what is the life of a schema

In this line of research, we have performed a thorough study on the evolution of databases that are part of larger open source projects, publicly available through open source repositories. Our first attempt towards unveiling how schemata evolve involved using Lehman's laws of software evolution, a well-established set of observations on how the typical software systems evolve (matured during the last forty years), as our guide towards providing insights on the mechanisms that govern schema evolution.

Our findings indicate that the schemata of open source databases expand over time, with long periods of calmness connected via bursts of maintenance effort focused in time, and with significant effort towards the perfective maintenance of the schema that appears to result in an unexpected lack of complexity increase. At the same time, unlike typical software systems, the incremental growth of the schema is typically low and its volume follows a Zipfian distribution. Still, although the technical assessment of Lehman's laws shows that the typical software systems evolve quite differently than database schemata, the essence of the laws is preserved: evolution is not about uncontrolled expansion; on the contrary, there appears to be a stabilization mechanism that employs perfective maintenance to control the otherwise growing trend of increase in the information capacity of the database.

Ioannis Skoulis, Panos Vassiliadis, Apostolos V. Zarras. Growing up with stability: How open-source relational databases evolve. Information Systems, Volume 53, October - November 2015, Pages 363 - 385. doi:10.1016/j.is.2015.03.009.

Long v. of CAiSE 2014 [Local page with highlights, papers, presentations, data, code and results]

Ioannis Skoulis, Panos Vassiliadis, Apostolos Zarras. Open-Source Databases: Within, Outside, or Beyond Lehman's Laws of Software Evolution?. 26th International Conference on Advanced Information Systems Engineering (CAiSE 2014). 16-20 June 2014, Thessaloniki, Hellas.

Macroscopic study of schema behavior[Local page with highlights, papers, presentations, data, code and results]

Watchlist

Schema Evolution and Gravitation to Rigidity: a tale of calmness in the lives of structured data: a keynote talk at the 7th International Conference on Model and Data Engineering (MEDI 2017), October 4-6, 2017 - Barcelona. [Click here for a 5-pages paper]

Schema evolution for relational databases: a keynote talk at the 5th International Conference on Data Management Technologies and Applications (DATA 2016), July 24 - July 26, Lisbon, Portugal [click here to watch the video of the talk at vimeo]


Keynote talk at DATA 2016