CS059 – Data Mining

Fall 2012

greek

Home

Material

Slides

Assignments

Lecture Slides

For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman and Jeff Ullman, Evimaria Terzi, for the material of their slides that we have used in this course.


Lecture 1: Introduction to Data Mining (ppt, pdf)


Lecture 2:
Data, pre-processing and post-processing (ppt, pdf)


Lecture 3:
Frequent Itemsets, Association Rules, Apriori algorithm.(ppt, pdf)



Lecture 4: Frequent Itemests, Association Rules. Evaluation. Beyond Apriori (ppt, pdf)


Lecture 5:
Similarity and Distance. Metrics. Min-wise independent hashing. (ppt,pdf)


Lecture 6:
Min-wise independent hashing. Locality Sensitive Hashing. Clustering, K-means algorithm (ppt,pdf)


Lecture 7:
Hierarchical clustering, DBSCAN, Mixture models and the EM algorithm  (ppt,pdf)


Lecture 8a:
Clustering Validity, Minimum Description Length (MDL), Introduction to Information Theory, Co-clustering using MDL. (ppt,pdf)

  • Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, KDD 2004, Seattle, August 2004. [PDF]
  • Some details about MDL and Information Theory can be found in the book Introduction to Data Miningby Tan, Steinbach, Kumar (chapters 2,4).


Lecture 8b:
Clustering Validity, Minimum Description Length (MDL), Introduction to Information Theory, Co-clustering using MDL. (ppt,pdf)

  • Chapter 2, Evimaria Terzi, Problems and Algorithms for Sequence Segmentations, Ph.D. Thesis (PDF


Lecture 9:
Dimensionality Reduction, Singular Value Decomposition (SVD), Principal Component Analysis (PCA). (ppt,pdf)


Lecture 10a:
Classification. Decision Trees. Evaluation.
(ppt,pdf)


Lecture 10b: Classification. k-Nearest Neighbor classifier, Logistic Regression, Support Vector Machines (SVM), Naive Bayes (ppt,pdf)


Lecture 11: Naive Bayes classifier. Supervised Learning. Web Search and PageRank (ppt,pdf)


Lecture 12: Link Analysis Ranking: PageRank, HITS, Random Walks  (ppt,pdf)


Lecture 13: Absorbing Random Walks. Coverage Problems (Set Cover, Maximum Coverage)  (ppt,pdf)