CSE012/CS059 – Data Mining
Fall 2022
|
|
Lecture Slides
For the slides of this course we will use slides and material from other courses and books. We thank in advance: Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material from their slides that we have used in this course. Introduction: Logistics (in Greek) (pptx, pdf) Lecture 1: Introduction to Data Mining (pptx, pdf)
Tutorial 1: Introduction to discrete probabilities. (pptx, pdf)
Lecture
2: What
is data? The data mining pipeline. Preprocessing and
postprocessing. Sampling and normalization. (pptx, pdf)
Lecture
3: Data
exploration and statistical analysis (pptx, pdf)
Tutorial 2: Introduction to notebooks. Python reminders. Lecture 4: Similarity
and Distance. Recommendation Systems (pptx, pdf)
Τutorial 3: Introduction to the Pandas library (ipynb, html) Τutorial 4: Libraries for statistical analysis and plotting
Lecture 5: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). Model-based collaborative filtering (pptx, pdf)
Tutorial
5: Introduction to the
Numpy and SciPy libraries for matrix
manipulation (ipynb, html). Lecture
6: Clustering.
The k-means algorithm. Hierarchical Clustering. The
DBSCAN algorithm. Clustering
Evaluation. (pptx, pdf)
Tutorial 6: Libraries for data
preprocessing (ipynb, html) Lecture
7: Mixture
Models. The EM Algorithm. (pptx, pdf)
Tutorial 7: Introduction to the SciKit-Learn (sklearn) library for clustering (ipynb, html) Lecture
8: Introduction
to Supervised Learning. Linear Regression.
Classification. Decision Trees - Expressiveness.
Evaluation. (pptx, pdf)
Lecture
9: Nearest Neighbor
Classification, Support Vector Machines,
Logistic Regression, (Naive Bayes
Classification). Neural
Networks. Word Embeddings. The Supervised
Learning pipeline. (pptx, pdf)
Tutorial 8: Introduction
to the scikit-learn library and
applications to classification. The gensim
library and word embeddings. (Notebook: ipynb, html). Lecture
10: Link Analysis Ranking Web
Ranking. PageRank, Random Walks, HITS. Absorbing
Random Walks. (pptx, pdf)
Tutorial 9: Introduction to the library NetworkX (Notebook: ipynb, html).
|