CSE012/CS059 – Data Mining

Fall 2015

greek

 

Home

Material

Slides

Assignments

Lecture Slides


For the slides of this course we will use slides and material from other courses and books. We thank in advance:  Tan, Steinbach and Kumar, Anand Rajaraman Jeff Ullman, and Jure Leskovec, Evimaria Terzi, Aris Anagnostopoulos for the material of their slides that we have used in this course.

Introduction: Logistics (in Greek) (pptx, pdf)

Lecture 1: Introduction to Data Mining (pptx, pdf)


Lecture 2
: Probability Theory. What is data? (pptx, pdf)


Lecture 3
: The data mining pipeline. Preprocessing and postprocessing. Samping and normalization (pptx, pdf)


Tutorial 1
: Introduction to Python programming language.
(pptx, pdf, ipynb)


Lecture 4
: The data mining pipeline. Preprocessing and postprocessing. Samping and normalization (pptx, pdf)


Tutorial 2
: Introduction to Pandas library.
(pptx, pdf, ipynb)


Lecture 5
: Similarity and Distance. Recommendation Systems (pptx, pdf)


Lecture 6
: Finding similar pairs. Min-hash signatures. Locality Sensitive Hashing (pptx, pdf)


Lecture 7
: Dimensionality Reduction. Singular Value Decomposition (SVD). Principal Component Analysis (PCA). (pptx, pdf)


Lecture 8
: Clustering. The k-means algorithm. Hierarchical Clustering. The DBSCAN algorithm (pptx, pdf)


Tutorial 3
: Introduction to Numpy, Scipy, SciKit for handling matrix operations.
(ipynb)


Lecture 9
: Mixture Models. The EM Algorithm. Evaluation (pptx, pdf)


Lecture 10: Classification. Decision Trees. Evaluation. (pptx, pdf)

Tutorial 4: Introduction to Clustering and Text Processing with SciKit. (ipynb)


Lecture 11: Other classification techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning. (pptx, pdf)

Tutorial 5: Introduction to Classification with SciKit. (ipynb)

Lecture 12: Link Analysis Ranking Web Ranking. PageRank, Random Walks, HITS. (pptx, pdf)


Tutorial 6: Introduction to Network analysis with NetworkX. (ipynb)

Lecture 13: Absorbing Random Walks. Coverage Problems. (pptx, pdf)