ΠΛΕ059 – Εξόρυξη Δεδομένων

Εαρινό Εξάμηνο, 2012

 

Αρχική

Υλικό

Ασκήσεις

Βιβλία και Διαφάνειες

Mining Massive Datasets by Anand Rajaraman and Jeff Ullman. Διατίθεται δωρεάν online. Διαφάνειες από το μάθημα.

Υλικό από το βιβλίο Data Mining: Concepts and Techniques”, by Jiawei Han and Micheline Kamber.

Υλικό από το βιβλίο Introduction to Data Mining” by Tan, Steinbach, Kumar.

 

Λογισμικό

Datasets

Διαλέξεις

  • Διάλεξη 1: Εισαγωγή στην εξόρυξη δεδομένων (ppt,pdf). Υλικό:
    • Κεφάλαιο 1, Introduction to Data Mining, by Tan, Steinbach, Kumar.
    • Κεφάλαιο 1, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman.
  • Διάλεξη 2: Frequent Itemsets and Association Rules (ppt, pdf). Υλικό:
    • Κεφάλαιο 6, Introduction to Data Mining, by Tan, Steinbach, Kumar
  • Διάλεξη 3:Frequent Itemsets and Association Rules II (ppt, pdf). O FP-Growth αλγόριθμος στα ελληνικά από της σημειώσεις της κα. Πιτουρά (ppt, pdf). Υλικό:
    • Κεφάλαιο 6, Introduction to Data Mining, by Tan, Steinbach, Kumar.
    • Κεφάλαιο 6, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman.
  • Διάλεξη 4: Similarity and Distance. Sketching, Min-Hashing, Locality Sensitive Hashing (ppt, pdf). Υλικό:
    • Κεφάλαιο 2, Introduction to Data Mining, by Tan, Steinbach, Kumar. (Similarity and Distance)
    • Κεφάλαιο 3, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman. (Min-Hashing, LSH)
  • Διάλεξη 5: Sketching, Min-Hashing, Locality Sensitive Hashing, Clustering (k-means, hierarchical clustering) (ppt, pdf). Υλικό:
    • Κεφάλαιο 3, Mining Massive Datasets, by Anand Rajaraman and Jeff Ullman. (Min-Hashing, LSH)
    • Κεφάλαιο 8, Introduction to Data Mining, by Tan, Steinbach, Kumar. (Clustering)
  • Διάλεξη 6: Mixture Models and the EM algorithm, DBSCAN algorithm, Clustering Validation (ppt, pdf). Υλικό:
    • Κεφάλαιο 9, Introduction to Data Mining, by Tan, Steinbach, Kumar. (EM Algorithm)
    • Κεφάλαιο 8, Introduction to Data Mining, by Tan, Steinbach, Kumar. (DBSCAN, Clustering Validation).
  • Διαλεξη 7: Minimum Description Length, Introduction to Information Theory, Co-Clustering (ppt, pdf). Υλικό:
    • Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, KDD 2004, Seattle, August 2004. [PDF]
  • Διάλεξη 8: Sequence Segmentation and Dynamic Programming, Dimensionality Reduction, Singular Value Decomposition (SVD), Principal Component Analysis (PCA) (ppt, pdf).Υλικό:
    • Κεφάλαιο 2, Evimaria Terzi, Problems and Algorithms for Sequence Segmentations, Ph.D. Thesis (PDF) (Sequence Segmentation).
    • Appendix B, Introduction to Data Mining, by Tan, Steinbach, Kumar. (Dimensionality Reduction)
  • Διάλεξη 9a: Classification: Decision Trees, Evaluation (ppt, pdf). Υλικό:
    • Κεφάλαιο 4, 5: Introduction to Data Mining, by Tan, Steinbach, Kumar.
  • Διάλεξη 9b: Classification: Decision Trees, Evaluation (ppt, pdf). Υλικό:
    • Κεφάλαιο 4, 5: Introduction to Data Mining, by Tan, Steinbach, Kumar.
  • Διάλεξη 10: Classification: Nearest Neighbor Classifier, SVM, Logistic Regression, Naive Bayes (ppt, pdf). Υλικό:
    • Κεφάλαιο 5: Introduction to Data Mining, by Tan, Steinbach, Kumar.
  • Διάλεξη 11: Nearest Neighbor Classification. Supervised Learning. Intro to Graphs and PageRank. (ppt, pdf). Υλικό:
    • Κεφάλαιο 5: Introduction to Data Mining, by Tan, Steinbach, Kumar (Naive Bayes).
  • Διάλεξη 12: Link Analysis, PageRank, HITS. Random Walks, Absorbing Rangom Walks. (ppt, pdf).
  • Διάλεξη 13:PageRank, Random Walks with Absorbing Nodes, Coverage (Set Cover, Maximum Coverage). (ppt, pdf).