CS059 – Data Mining
Fall 2012
|
|
Material
Books and Slides
·
Mining Massive Datasets by Anand Rajaraman and Jeff Ullman. Free
online book. Slides from the course. ·
Material
from the book “Data Mining: Concepts and
Techniques”, by Jiawei Han and Micheline Kamber. ·
Material
from the book “Introduction to
Data Mining” by Tan, Steinbach, Kumar. ·
Material
from the book "Introduction
to Information Retrieval" by C. Manning, P. Raghavan,
H. Schutze ·
Material
from the book "Networks Crowds
and Markets" by D. Easley, J. Kleinberg Software
·
WEKA Data Mining Software: A
software package that implements multiple data mining tools. ·
FIMI: Frequent Itemsets
Mining Implementation: A repository of implementations for frequent itemset mining. All implementations assume the input
format of the example datasets: text file where each row is a basket
consisting of space separated integers that represent the items. Datasets
·
UCI
Machine Learning Repository o
Data for Assignment 4: § Τhe Iris dataset (ARFF file).Τhe link to UCI
repository. § The
Mushroom dataset (ARFF file). The link to UCI
repository. § The
SpamBase dataset (ARFF file). Τhe link to UCI
repository ·
Movie Lens Datasets by GroupLens
Research
·
Twitter
data from the paper “What
is Twitter, a Social Network, or a News Media?” by Haewoon
Kwak, Changhyun Lee, Hosung Park, and Sue Moon. For the first Assignment, you
need the Restricted User Profiles data file. The fields in the file are
explained on the page, you are interested in the eleventh field which is the
profile description.
·
English Stopwords. Txt file
with a list of English stopwords.
·
SpamAssassin.
·
Stanford
Network Analysis Project Datasets.
·
Movie-Actor Graph. Each line in the file is a tab-separated
movie-actor pair, i.e., it corresponds to one edge in the graph.
|