Biomedical Data Classification
- Evanthia Tripoliti, Dimitrios Fotiadis, and George Manis, ‘‘Automated diagnosis of diseases based on classification: dynamic determination of the number of trees in random forests algorithm,’’ IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 4, pp. 615–622, Jul. 2012 [link]
Abstract: The accurate diagnosis of diseases with high prevalence rate, such as Alzheimer, Parkinson, diabetes, breast cancer, and heart diseases, is one of the most important biomedical problems whose administration is imperative. In this paper, we present a new method for the automated diagnosis of diseases based on the improvement of random forests classification algorithm. More specifically, the dynamic determination of the optimum number of base classifiers composing the random forests is addressed. The proposed method is different from most of the methods reported in the literature, which follow an overproduce-and-choose strategy, where the members of the ensemble are selected from a pool of classifiers, which is known a priori. In our case, the number of classifiers is determined during the growing procedure of the forest. Additionally, the proposed method produces an ensemble not only accurate, but also diverse, ensuring the two important properties that should characterize an ensemble classifier. The method is based on an online fitting procedure and it is evaluated using eight biomedical datasets and five versions of the random forests algorithm (40 cases). The method decided correctly the number of trees in 90% of the test cases.
- Evanthia Tripoliti, Dimitrios Fotiadis, and George Manis, ‘‘Modifications of the construction and voting mechanisms of the random forests algorithm,’’ Data Knowledge and Engineering, Elsevier, vol. 87, pp. 41–65, Sep. 2013 [link]
Abstract The aim of this work is to propose modifications of the Random Forests algorithm which improve its prediction performance. The suggested modifications intend to increase the strength and decrease the correlation of individual trees of the forest and to improve the function which determines how the outputs of the base classifiers are combined. This is achieved by modifying the node splitting and the voting procedure. Different approaches concerning the number of the predictors and the evaluation measure which determines the impurity of the node are examined. Regarding the voting procedure, modifications based on feature selection, clustering, nearest neighbors and optimization techniques are proposed. The novel feature of the current work is that it proposes modifications, not only for the improvement of the construction or the voting mechanisms but also, for the first time, it examines the overall improvement of the Random Forests algorithm (a combination of construction and voting). We evaluate the proposed modifications using 24 datasets. The evaluation demonstrates that the proposed modifications have positive effect on the performance of the Random Forests algorithm and they provide comparable, and, in most cases, better results than the existing approaches.
- Evanthia Tripoliti, Dimitrios Fotiadis, Maria Argyropoulou, and George Manis, ‘‘A six stage approach for the diagnosis of the Alzheimer’s disease based on fMRI data,’’ Journal of Biomedical Informatics, Elsevier, vol. 43, no. 2, pp. 307–320, Apr. 2010 [link]
Abstract: The aim of this work is to present an automated method that assists in the diagnosis of Alzheimer’s disease and also supports the monitoring of the progression of the disease. The method is based on features extracted from the data acquired during an fMRI experiment. It consists of six stages: (a) preprocessing of fMRI data, (b) modeling of fMRI voxel time series using a Generalized Linear Model, (c) feature extraction from the fMRI data, (d) feature selection, (e) classification using classical and improved variations of the Random Forests algorithm and Support Vector Machines, and (f) conversion of the trees, of the Random Forest, to rules which have physical meaning. The method is evaluated using a dataset of 41 subjects. The results of the proposed method indicate the validity of the method in the diagnosis (accuracy 94%) and monitoring of the Alzheimer’s disease (accuracy 97% and 99%).