| CSE012/CS059 – Data Mining Fall 2020 |  | 
| Lecture SlidesFor the slides of this course we will use slides and material from other courses and books. We thank in advance: Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course. Introduction: Logistics (in Greek) (pptx, pdf) Lecture 1: Introduction to Data Mining (pptx, pdf) 
 Tutorial
                    1: Introduction
                  to discrete probabilities. (pdf) 
 Lectures
                    2-3: What
                  is data? The data mining pipeline. Preprocessing and
                  postprocessing. Sampling and normalization. Data
                  exploration and statistical analysis (pptx, pdf) 
 Lecture 4: Similarity
                  and Distance. Recommendation Systems (pptx, pdf) 
 Lecture 5: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf) 
 Tutorial 2: Introduction to notebooks and the Pandas library 
 Lecture
                    6: Clustering.
                  The k-means algorithm. Hierarchical Clustering. The
                  DBSCAN algorithm. Clustering
                    Evaluation. (pptx, pdf) 
 Lecture
                    7: Mixture
                  Models. The EM Algorithm. (pptx, pdf) 
 Tutorial
                    3: Introduction
                  to the
                        Numpy library (Notebook: ipynb, html, html
                                  slides, pdf). Lecture
                    8: Introduction
                  to Supervised Learning. Linear Regression.
                  Classification. Decision Trees. Evaluation. (pptx, pdf) 
 Tutorial
                    4: Introduction
                                      to the SciKit-Learn library and
                                      its applications to clustering and
                                      data processing (Notebook: ipynb, html, html slides, pdf). Lecture
                    9: Other
                  classification techniques. Nearest Neighbor
                  Classifiers, Support Vector Machines, Logistic
                  Regression, Naive Bayes Classification. The Supervised
                  Learning pipeline. (pptx, pdf) 
 Tutorial 5:  Introduction
                        to the scikir-learn library and applciations for
                        classification and data processing (Notebook: ipynb, html, html slides). 
 Tutorial 6: Introduction to the library NetworkX (Notebook: ipynb, html, html slides, pdf). 
 
 |