CSE012/CS059 – Data Mining
Fall 2021
|
|
Lecture Slides
For the slides of this course we will use slides and material from other courses and books. We thank in advance: Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course. Introduction: Logistics (in Greek) (pptx, pdf) Lecture 1: Introduction to Data Mining (pptx, pdf)
Lecture
2: What
is data? The data mining pipeline. Preprocessing and
postprocessing. Sampling and normalization. (pptx, pdf)
Tutorial 1: Introduction to discrete probabilities. (pptx, pdf)
Lecture
3: Data
exploration and statistical analysis (pptx, pdf)
Tutorial 2: Introduction to notebooks. Python reminders. Lecture 4: Similarity
and Distance. Recommendation Systems (pptx, pdf)
Τutorial 3: Introduction to the Pandas library
Τutorial 4: Libraries for statistical analysis and plotting
Lecture 5: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). Model-based collaborative filtering (pptx, pdf)
Tutorial 5: Introduction to the Numpy and SciPy libraries for matrix manipulation (ipynb, html, html slides, pdf). Lecture
6: Clustering.
The k-means algorithm. Hierarchical Clustering. The
DBSCAN algorithm. Clustering
Evaluation. (pptx, pdf)
Tutorial 6: Libraries for data
preprocessing (ipynb, html, html
slides, pdf) Lecture
7: Mixture
Models. The EM Algorithm. (pptx, pdf)
Tutorial 7: Introduction to the SciKit-Learn (sklearn) library for clustering (ipynb, html, html slides, pdf) Lecture
8: Introduction
to Supervised Learning. Linear Regression.
Classification. Decision Trees - Expressiveness. Nearest Neighbor
Classification, Support Vector Machines,
Logistic Regression, (Naive Bayes
Classification)
(pptx, pdf)
Lecture
9: Neural
Networks. Word Embeddings. Evaluation. The
Supervised Learning pipeline. (pptx, pdf)
Tutorial 8: Introduction
to the scikit-learn library and
applications to classification. The gensim
library and word embeddings. (Notebook: ipynb, html, html
slides). Lecture
10: Link Analysis Ranking Web
Ranking. PageRank, Random Walks, HITS. Absorbing
Random Walks. (pptx, pdf)
Tutorial 9: Introduction to the library NetworkX (Notebook: ipynb, html, html slides, pdf).
|