In a paper in ICDE 2021, we present the findings of a large study of the evolution of the schema of 195 Free Open Source Software projects. We identify families of evolutionary behaviors, or taxa, in FOSS projects. A large percentage of the projects demonstrate very few, if any, actions of schema evolution. Two other taxa involve the evolution via focused actions, with either a single focused maintenance action, or a large percentage of evolution activity grouped in no more than a couple interventions. Schema evolution also involves moderate, and active evolution, with very different volumes of updates to the schema. To the best of our knowledge, this is the first study of this kind in the area of schema evolution, both in terms of presenting profiles of how schemata evolve, and, in terms of the dataset magnitude and the generalizability of the findings.
Panos Vassiliadis. Profiles of Schema Evolution in Free Open Source Software Projects.37th IEEE International Conference on Data Engineering (ICDE '21), Chania, Crete, Greece, 19-22 April 2021.
Local page of the paper
In this line of research, we have performed a thorough study on the evolution of databases that are part of larger open source projects, publicly available through open source repositories. Our first attempt towards unveiling how schemata evolve involved using Lehman's laws of software evolution, a well-established set of observations on how the typical software systems evolve (matured during the last forty years), as our guide towards providing insights on the mechanisms that govern schema evolution.
Our findings indicate that the schemata of open source databases expand over time, with long periods of calmness connected via bursts of maintenance effort focused in time, and with significant effort towards the perfective maintenance of the schema that appears to result in an unexpected lack of complexity increase. At the same time, unlike typical software systems, the incremental growth of the schema is typically low and its volume follows a Zipfian distribution. Still, although the technical assessment of Lehman's laws shows that the typical software systems evolve quite differently than database schemata, the essence of the laws is preserved: evolution is not about uncontrolled expansion; on the contrary, there appears to be a stabilization mechanism that employs perfective maintenance to control the otherwise growing trend of increase in the information capacity of the database.
Ioannis Skoulis, Panos Vassiliadis, Apostolos V. Zarras. Growing up with stability: How open-source relational databases evolve. Information Systems, Volume 53, October - November 2015, Pages 363 - 385. doi:10.1016/j.is.2015.03.009.
Long v. of CAiSE 2014 [Local page with highlights, papers, presentations, data, code and results]
Ioannis Skoulis, Panos Vassiliadis, Apostolos Zarras. Open-Source Databases: Within, Outside, or Beyond Lehman's Laws of Software Evolution?. 26th International Conference on Advanced Information Systems Engineering (CAiSE 2014). 16-20 June 2014, Thessaloniki, Hellas.
Macroscopic study of schema behavior[Local page with highlights, papers, presentations, data, code and results]
Schema Evolution and Gravitation to Rigidity: a tale of calmness in the lives of structured data: a keynote talk at the 7th International Conference on Model and Data Engineering (MEDI 2017), October 4-6, 2017 - Barcelona. [Click here for a 5-pages paper]
Schema evolution for relational databases: a keynote talk at the 5th International Conference on Data Management Technologies and Applications (DATA 2016), July 24 - July 26, Lisbon, Portugal [click here to watch the video of the talk at vimeo]
Keynote talk at DATA 2016