Anastasios Karagiannis, Panos Vassiliadis, Alkis Simitsis
9th International Workshop on Quality in Databases (QDB 2011). In conjunction with VLDB 2011, August 29th, 2011, Seattle, USA
Extract-Transform-Load (ETL) workflows (a) extract data from various sources, (b) transform, cleanse and homogenize these data, and (c) populate a target data store (e.g., a data warehouse). Typically, such processes should terminate during strict time windows and thus, ETL workflow optimization is of significant interest. In this paper, we deal with the problem of scheduling the execution of ETL activities, with the goal of minimizing ETL execution time and allocated memory. Apart from a simple, fair scheduling policy we also experiment with two policies, the first aiming to empty the largest input queue of the workflow and the second to activate the activity with the maximum tuple consumption rate. We experimentally show that the use of different scheduling policies can improve ETL performance in terms of memory consumption and execution time.