Biomedical Data Classification
Selected Publications
-
Evanthia Tripoliti, Dimitrios Fotiadis, and George Manis,
‘‘Automated diagnosis of diseases based on classification:
dynamic determination of the number of trees in random forests algorithm,’’ IEEE Transactions on Information Technology in Biomedicine, vol.
16, no. 4, pp. 615–622, Jul. 2012 [link]
Abstract: The accurate
diagnosis of diseases with high prevalence rate, such as Alzheimer, Parkinson,
diabetes, breast cancer, and heart diseases, is one of the most important
biomedical problems whose administration is imperative. In this paper, we
present a new method for the automated diagnosis of diseases based on the
improvement of random forests classification algorithm. More specifically, the
dynamic determination of the optimum number of base classifiers composing the
random forests is addressed. The proposed method is different from most of the
methods reported in the literature, which follow an overproduce-and-choose
strategy, where the members of the ensemble are selected from a pool of
classifiers, which is known a priori. In our case, the number of classifiers is
determined during the growing procedure of the forest. Additionally, the
proposed method produces an ensemble not only accurate, but also diverse,
ensuring the two important properties that should characterize an ensemble classifier.
The method is based on an online fitting procedure and it is evaluated using
eight biomedical datasets and five versions of the random forests algorithm (40
cases). The method decided correctly the number of trees in 90% of the test
cases.
-
Evanthia Tripoliti, Dimitrios Fotiadis, and George Manis,
‘‘Modifications of the construction and voting
mechanisms of the random forests algorithm,’’ Data
Knowledge and Engineering, Elsevier, vol. 87, pp. 41–65, Sep. 2013 [link]
Abstract The aim of this
work is to propose modifications of the Random Forests algorithm which improve
its prediction performance. The suggested modifications intend to increase the
strength and decrease the correlation of individual trees of the forest and to
improve the function which determines how the outputs of the base classifiers
are combined. This is achieved by modifying the node splitting and the voting
procedure. Different approaches concerning the number of the predictors and the
evaluation measure which determines the impurity of the node are examined.
Regarding the voting procedure, modifications based on feature selection,
clustering, nearest neighbors and optimization techniques are proposed. The
novel feature of the current work is that it proposes modifications, not only
for the improvement of the construction or the voting mechanisms but also, for
the first time, it examines the overall improvement of the Random Forests
algorithm (a combination of construction and voting). We evaluate the proposed
modifications using 24 datasets. The evaluation demonstrates that the proposed
modifications have positive effect on the performance of the Random Forests
algorithm and they provide comparable, and, in most cases, better results than
the existing approaches.
-
Evanthia Tripoliti, Dimitrios Fotiadis, Maria Argyropoulou,
and George Manis, ‘‘A six stage approach for the
diagnosis of the Alzheimer’s disease based on fMRI
data,’’ Journal of Biomedical Informatics, Elsevier,
vol. 43, no. 2, pp. 307–320, Apr. 2010 [link]
Abstract: The aim of this
work is to present an automated method that assists in the diagnosis of
Alzheimer’s disease and also supports the monitoring of the progression of the
disease. The method is based on features extracted from the data acquired
during an fMRI experiment. It consists of six stages:
(a) preprocessing of fMRI data, (b) modeling of fMRI voxel time series using a
Generalized Linear Model, (c) feature extraction from the fMRI
data, (d) feature selection, (e) classification using classical and improved
variations of the Random Forests algorithm and Support Vector Machines, and (f)
conversion of the trees, of the Random Forest, to rules which have physical
meaning. The method is evaluated using a dataset of 41 subjects. The results of
the proposed method indicate the validity of the method in the diagnosis
(accuracy 94%) and monitoring of the Alzheimer’s disease (accuracy 97% and
99%).