CSE012/CS059 – Data Mining

Spring 2017






Lecture Slides

For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course.

Introduction: Logistics (in Greek) (pptx, pdf)

Lecture 1: Introduction to Data Mining (pptx, pdf)

Tutorial 1: Introduction to discrete probabilities. (pdf)

  • Thanks to Aris Anagnostopoulos for the slides.

Lecture 2: What is data? The data mining pipeline. Preprocessing and postprocessing. Samping and normalization (pptx, pdf)

Lecture 3: Frequent Itemsets and Association Rules (pptx, pdf)

Tutorial 2: Introduction to Python. (pptx, pdf, ipynb, html)

  • The file with the image here

Lecture 4: Similarity and Distance. Recommendation Systems (pptx, pdf)

Tutorial 3: Introduction to Pandas. (pptx, pdf, ipynb, html)

Lecture 5: Finding similar pairs. Min-hash signatures. Locality Sensitive Hashing (pptx, pdf)

Lecture 6: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf)

Tutorial 4: Introduction to Numpy, Scipy, SciKit for handling matrices. (ipynb, html)

Lecture 7: Clustering. The k-means algorithm. Hierarchical Clustering. The DBSCAN algorithm. Clustering Evaluation (pptx, pdf)

Lecture 9: Mixture Models. The EM Algorithm. Sequence Segmentation (pptx, pdf)

Tutorial 5: Introduction to Clustering and Feature Extraction with SciKit-Learn.  (ipynb, html)

Lecture 9: Classification. Decision Trees. Evaluation. (pptx, pdf)

Lecture 10: Other classification techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning. (pptx, pdf)

Tutorial 6: Introduction to Classification with SciKit. (ipynb,html)

Lecture 11: Link Analysis Ranking Web Ranking. PageRank, Random Walks, HITS. Absorbing Random Walks. (pptx, pdf)

Lecture 12: Community discovery in graphs. Edge Betweenness Centrality. (pptx, pdf)

Tutorial 7: Introduction to Network analysis with NetworkX. (ipynb, html)

Lecture 13: Absorbing Random Walks. Coverage Problems. (pptx, pdf)