CS059 – Data Mining

Fall 2013

greek

 

Home

Material

Slides

Assignments

Lecture Slides

For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course.

 

Lecture 1: Introduction to Data Mining (ppt, pdf)


Lecture 2: Data, pre-processing and post-processing (ppt, pdf)


Lecture
3: Frequent Itemsets, Association Rules, Apriori algorithm.(ppt, pdf)

 

Lecture 4: Association Rules, Evaluation of Rules. Alternative algorithms for frequent itemsets. (ppt, pdf)

 

Lecture 5: Similarity and Distance. Metrics. Recommender Systems. Document Shingling. (ppt, pdf)

  • Chapter 3 from the book Mining Massive Datasets by Anand Rajaraman and Jeff Ullman, Jure Leskovec.
  • Chapter 2 from the book “Introduction to Data Mining” by Tan, Steinbach, Kumar.
  • Chapter 9 from the book Mining Massive Datasets by Anand Rajaraman and Jeff Ullman, Jure Leskovec.

 

Lecture 6: Document Shingling. Min-hashing and Sketching. Locality Sensitive Hashing (LSH).  (ppt,pdf)


Lecture 7: Clustering: k-means, hierarchical clustering, DBSCAN.(ppt, pdf)


Lecture 8: Clustering: EM Algorithm, Clustering Evaluation, Sequence Segmentation.(ppt, pdf)

  • Chapters 8,9 from the book Introduction to Data Miningby Tan, Steinbach, Kumar.
  • Course Notes on EM from Aris Anagnostopoulos, University of Rome La Sapienza.
  • Chapter 2, Evimaria Terzi, Problems and Algorithms for Sequence Segmentations, Ph.D. Thesis (PDF)


Lecture 9: Minimum Description Length (MDL). Introduction to Information Theory. Co-clustering using MDL.(ppt, pdf)
  • Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, KDD 2004, Seattle, August 2004. [PDF]
  • Some details about MDL and Information Theory can be found in the book Introduction to Data Miningby Tan, Steinbach, Kumar (chapters 2,4).

Lecture 10: Introduction to Classification. Decision Trees. Classification Evaluation. Nearest Neighbor Classifier.(ppt, pdf)


Lecture 11: Classification: Support Vector Machines, Logistic Regression, Naive Bayes Classifier. Supervised Learning. (ppt, pdf)


Lecture 12: Link Analysis Ranking: PageRank -- Random Walks. The HITS algorithm. (ppt, pdf)


Lecture 13: Absorbing Random Walks. Coverage. (ppt, pdf)