CSE012/CS059 – Data Mining
Fall 2020
|
|
Lecture Slides
For the slides of this course we will use slides and material from other courses and books. We thank in advance: Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course. Introduction: Logistics (in Greek) (pptx, pdf) Lecture 1: Introduction to Data Mining (pptx, pdf)
Tutorial
1: Introduction
to discrete probabilities. (pdf)
Lectures
2-3: What
is data? The data mining pipeline. Preprocessing and
postprocessing. Sampling and normalization. Data
exploration and statistical analysis (pptx, pdf)
Lecture 4: Similarity
and Distance. Recommendation Systems (pptx, pdf)
Lecture 5: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf)
Tutorial 2: Introduction to notebooks and the Pandas library
Lecture
6: Clustering.
The k-means algorithm. Hierarchical Clustering. The
DBSCAN algorithm. Clustering
Evaluation. (pptx, pdf)
Lecture
7: Mixture
Models. The EM Algorithm. (pptx, pdf)
Tutorial
3: Introduction
to the
Numpy library (Notebook: ipynb, html, html
slides, pdf). Lecture
8: Introduction
to Supervised Learning. Linear Regression.
Classification. Decision Trees. Evaluation. (pptx, pdf)
Tutorial
4: Introduction
to the SciKit-Learn library and
its applications to clustering and
data processing (Notebook: ipynb, html, html slides, pdf). Lecture
9: Other
classification techniques. Nearest Neighbor
Classifiers, Support Vector Machines, Logistic
Regression, Naive Bayes Classification. The Supervised
Learning pipeline. (pptx, pdf)
Tutorial 5: Introduction
to the scikir-learn library and applciations for
classification and data processing (Notebook: ipynb, html, html slides).
Tutorial 6: Introduction to the library NetworkX (Notebook: ipynb, html, html slides, pdf).
|