CSE012/CS059 – Data Mining
Fall 2019
|
|
Lecture Slides
For the slides of this course we will use slides and material from other courses and books. We thank in advance: Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course. Introduction: Logistics (in Greek) (pptx, pdf) Lecture 1: Introduction to Data Mining (pptx, pdf)
Tutorial
1: Introduction
to discrete probabilities. (pdf)
Lectures
2-3: What is
data? The data mining pipeline. Preprocessing and
postprocessing. Sampling and normalization (pptx, pdf)
Lecture 4: Similarity
and Distance. Recommendation Systems (pptx, pdf)
Tutorial 2: Introduction to notebooks and the Pandas library (Slides: pptx, pdf), (Notebook: ipynb, html, html slides, pdf)
Lecture 5: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf)
Lecture
6: Clustering.
The k-means algorithm. Hierarchical Clustering. The
DBSCAN algorithm. Clustering Evaluation. (pptx, pdf)
Lecture
7: Mixture
Models. The EM Algorithm. (pptx, pdf)
Tutorial
3: Introduction
to the Numpy library (Notebook: ipynb,
html, html
slides, pdf). Introduction to
the SciKit-Learn library and its
applications to clustering and
data processing (Notebook:
ipynb,
html,
html
slides, pdf). Lecture
8: Introduction
to Supervised Learning. Linear Regression.
Classification. Decision Trees. Evaluation. (pptx, pdf)
Lecture
9: Other
classification techniques. Nearest Neighbor
Classifiers, Support Vector Machines, Logistic
Regression, Naive Bayes Classification. The Supervised
Learning pipeline. (pptx, pdf)
Tutorial 4:
Introduction to the scikir-learn library and
applciations for classification and data
processing (Notebook:
ipynb,
html,
html
slides, pdf).
Tutorial 5: Introduction to the library NetworkX (Notebook: ipynb, html, html slides, pdf).
|