CSE012/CS059 – Data Mining

Fall 2014

greek

 

Home

Material

Slides

Assignments

Lecture Slides

For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course.



Lecture 1
: Introduction to Data Mining (pptx, pdf)


Lecture
2: Probability Theory. Data, pre-processing and post-processing (ppt, pdf)


Lecture 3: Finding frequent Itemsets. The A-priori algorithm. Finding Association Rules (ppt, pdf)


Lecture
4: Association Rules, Evaluation of Rules. Alternative algorithms for frequent itemsets. (ppt, pdf)


Lecture
5: Similarity and Distance. Metrics. Recommender Systems. (ppt, pdf)


Lecture 6
: Finding Similar Pairs. Min-Hashing. Locality Sensitive Hashing. (ppt, pdf)


Lecture 7: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (ppt, pdf)


Lecture 8: Clustering. The k-means algorithm. Hierarchical Clustering. The DBSCAN algorithm. (ppt, pdf)


Lecture 9: Mixture models and the EM algorithm. Clustering Evaluation. Sequence Segmentation. (ppt, pdf)

Lecture 10: Minimum Description Length (MDL). Introduction to Information Theory. Co-Clustering. (ppt, pdf)
  • Some information about MDL and Information Theory appears in Chapters 2,4 from the book “Introduction to Data Mining by Tan, Steinbach, Kumar
  • Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, KDD 2004, Seattle, August 2004. [PDF]


Lecture 11: Classification. Decision Trees. Evaluation. (ppt, pdf)


Lecture 12: Classification. Nearest Neighbor Classification. Support Vector Machines. Logistic Regression Classification. Naive Bayes Learning. Supervised Learning. (ppt, pdf)


Lecture 13: Link Analysis for Web Ranking. PageRank - Random Walks. The HITS algorithm. (ppt, pdf)


Lecture 14: Absorbing Random Walks. Coverage Problems. (ppt, pdf)


Lecture 15: The Map-Reduce computational paradigm (ppt, pdf)