CSE012/CS059 – Data Mining

Fall 2020

greek

Home

Material

Slides

Assignments

Lecture Slides


For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course.

Introduction: Logistics (in Greek) (pptx, pdf)

Lecture 1: Introduction to Data Mining (pptx, pdf)

Tutorial 1: Introduction to discrete probabilities. (pdf)

  • Thanks to Aris Anagnostopoulos for the slides.
  • Part I from the book All of Statistics by Larry A. Wasserman

Lectures 2-3: What is data? The data mining pipeline. Preprocessing and postprocessing. Sampling and normalization. Data exploration and statistical analysis (pptx, pdf)

  • Chapters 2,3 from the book “Introduction to Data Mining” by Tan Steinbach Kumar.
  • Chapter 1 from the book Mining Massive Datasets by Anand Rajaraman and Jeff Ullman, Jure Leskovec.
  • Chapters 7-8 (confidence interval, standard error), 11 (hypothesis testing), 16 (independence and correlation tests) from the book All of Statistics byLarry A. Wasserman (the chapter numbers are for the pdf, in the actual book the chapter numbers are -1 of the ones above).

Lecture 4: Similarity and Distance. Recommendation Systems (pptx, pdf)

Lecture 5: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA).  (pptx, pdf)

Tutorial 2: Introduction to notebooks and the Pandas library

Lecture 6: Clustering. The k-means algorithm. Hierarchical Clustering. The DBSCAN algorithm. Clustering Evaluation. (pptx, pdf)

Lecture 7: Mixture Models. The EM Algorithm. (pptx, pdf)

Tutorial 3: Introduction to the Numpy library (Notebook: ipynb, html, html slides, pdf).

Lecture 8: Introduction to Supervised Learning. Linear Regression. Classification. Decision Trees. Evaluation. (pptx, pdf)

Tutorial 4: Introduction to the SciKit-Learn library and its applications to clustering and data processing (Notebook: ipynb, html, html slides, pdf).

Lecture 9: Other classification techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. The Supervised Learning pipeline. (pptx, pdf)

Tutorial 5:  Introduction to the scikir-learn library and applciations for classification and data processing (Notebook: ipynb, html, html slides).

Lecture 10: Link Analysis Ranking Web Ranking. PageRank, Random Walks, HITS. Absorbing Random Walks. (pptx, pdf)

Tutorial 6: Introduction to the library NetworkX (Notebook: ipynb, html, html slides, pdf).