Open-Source Databases: Within, Outside, or Beyond Lehman's Laws of Software Evolution?

Ioannis Skoulis, Panos Vassiliadis, Apostolos Zarras

Summary

Like all software systems, databases are subject to evolution as time passes. The impact of this evolution can be vast as a change to the schema of a database can affect the syntactic correctness and the semantic validity of all the surrounding applications. In this line of research, we have performed a thorough, large-scale study on the evolution of databases that are part of larger open source projects, publicly available through open source repositories. Lehman's laws of software evolution, a well-established set of observations on how the typical software systems evolve (matured during the last forty years), has served as our guide towards providing insights on the mechanisms that govern schema evolution. Much like software systems, we found that schemata expand over time, under a stabilization mechanism that constraints uncontrolled expansion with perfective maintenance. At the same time, unlike typical software systems, the growth is typically low, with long periods of calmness interrupted by bursts of maintenance and a surprising lack of complexity increase.

Plz., refer to our Schema biographies page for a general overview of our research program.

Highlights

We believe that we can indeed claim that schema evolution is guided by a feedback based mechanism.

Publications

Presentations

Experimental Resources

The following code and data are presented on-line to allow the reproduction of results by others. We would like to to clearly state that we simply cannot support any requests for the maintenance of the code, or clarifications, explanations etc. Moreover, we do not assume any responsibility for any side effects of the code (although we cannot think of, or have ever encountered, any). You are free to reuse the following code and data for academic purposes, provided you give the appropriate citation:

Ioannis Skoulis, Panos Vassiliadis, Apostolos Zarras. Open-Source Databases: Within, Outside, or Beyond Lehman's Laws of Software Evolution?. 26th International Conference on Advanced Information Systems Engineering (CAiSE 2014), 16-20 June 2014, Thessaloniki, Hellas. Source code, datasets, presentations available at http://www.cs.uoi.gr/~pvassil/publications/2014_CAiSE/

(and, yes, academic honesty rules impose that this includes student projects too ;) )

Input: Evolution Datasets

Code: Source code for Hecate. Requires Java 7 and Eclipse.

Results concerning growth and size of schemata: here (xlsx)