CSE012/CS059 – Data Mining
Fall 2015
|
|
Lecture Slides
For the slides of this course we will use slides and material from other courses and books. We thank in advance: Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course. Introduction: Logistics (in Greek) (pptx, pdf) Lecture 1: Introduction to Data Mining (pptx, pdf)
Lecture 2: Probability Theory. What is data? (pptx, pdf)
Lecture 3: The data mining pipeline. Preprocessing and postprocessing. Samping and normalization (pptx, pdf)
Tutorial 1: Introduction to Python programming language. (pptx, pdf, ipynb) Lecture 4: The data mining pipeline. Preprocessing and postprocessing. Samping and normalization (pptx, pdf)
Tutorial 2: Introduction to Pandas library. (pptx, pdf, ipynb) Lecture 5: Similarity and Distance. Recommendation Systems (pptx, pdf)
Lecture 6: Finding similar pairs. Min-hash signatures. Locality Sensitive Hashing (pptx, pdf)
Lecture 7: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf)
Lecture 8: Clustering. The k-means algorithm. Hierarchical Clustering. The DBSCAN algorithm (pptx, pdf)
Tutorial 3: Introduction to Numpy, Scipy, SciKit for handling matrix operations. (ipynb) Lecture 9: Mixture Models. The EM Algorithm. Evaluation (pptx, pdf)
Lecture 10: Classification. Decision Trees. Evaluation. (pptx, pdf)
Tutorial 4: Introduction to Clustering and Text Processing with SciKit. (ipynb) Lecture 11: Other classification techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning. (pptx, pdf)
Tutorial 5: Introduction to Classification with SciKit. (ipynb) Lecture 12: Link Analysis Ranking Web Ranking. PageRank, Random Walks, HITS. (pptx, pdf)
Lecture
13: Absorbing
Random Walks. Coverage Problems. (pptx, pdf)
|