CSE012/CS059 – Data Mining

Spring 2017

greek

Home

Material

Slides

Assignments

Lecture Slides


For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course.

Introduction: Logistics (in Greek) (pptx, pdf)

Lecture 1: Introduction to Data Mining (pptx, pdf)

Lecture 2: What is data? The data mining pipeline. Preprocessing and postprocessing. Samping and normalization (pptx, pdf)

Tutorial 1: Introduction to discrete probabilities. (pdf)

  • Thanks to Aris Anagnostopoulos for the slides.

Lecture 3: Frequent Itemsets and Association Rules (pptx, pdf)

Tutorial 2: Introduction to Python. (pptx, pdf), (ipynb, html) and Pandas (pptx, pdf), (ipynb, html)

Lecture 4: Similarity and Distance. Recommendation Systems (pptx, pdf)

Lecture 5: Finding similar pairs. Min-hash signatures. Locality Sensitive Hashing (pptx, pdf)

Lecture 6: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf)

Lecture 7: Clustering. The k-means algorithm. Hierarchical Clustering. The DBSCAN algorithm. (pptx, pdf)

Lecture 9: Clustering Evaluation. Mixture Models. The EM Algorithm. (pptx, pdf)

Tutorial 3: Introduction to Numpy, Scipy, SciKit for handling matrices. (ipynb, html) and for Clustering and Feature Extraction. (ipynb, html)

  • Notes from Mark Crovella on dimensionality reduction

Lecture 9: Classification. Decision Trees. Evaluation. (pptx, pdf)

Lecture 10: Other classification techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning. (pptx, pdf)

Lecture 11: Link Analysis Ranking Web Ranking. PageRank, Random Walks, HITS. Absorbing Random Walks. (pptx, pdf)

Tutorial 4: Introduction to Classification with SciKit. (ipynb,html). Introduction to Network analysis with NetworkX. (ipynb, html)

Lecture 12: Absorbing Random Walks. Coverage Problems. (pptx, pdf)