Greedy mixture learning for multiple motif discovery in biological sequences K. Blekas, D. Fotiadis and A. Likas

Bioinformatics, 19: 607-617, 2003 

( Abstract), 
( Full text )

 

Greedy EM algorithm:

Discovering probabilistic motifs in a set of biological sequences by learning a mixture of motifs model through likelihood maximization. The algorithm adds sequentially motifs to a mixture model by performing a combined scheme of global and local search for appropriately initializing component parameters. A kd-tree partitioning scheme is also used to speed-up the global search procedure.
Get extended Experiments with Artificial Datasets: (PDF) , (PS)

Download:

¸        Matlab source code: GreedyEM.zip

¸        Fingerprints (groups of motifs) discovered by GreedyEM in 6 PRINTS families:

Family Name

Download file describing motis of different length

PR00058

PR00058.zip Length = [15,18,20,22,25]

PR00061

PR00061.zip Length = [15,20,24,28,30]

PR00810

PR00810.zip Length = [8,10,11,12,15]

PR01266

PR01266.zip Length = [10,12,15,17,20]

PR01267

PR01267.zip Length = [10,12,14,18,20]

PR01268

PR01268.zipá Length = [10,12,17,20,25]

 

 

Konstantinos Blekas, Ph.D

E-mail: kblekas@cs.uoi.gr