Administrative
Class Hours: Tuesday
4:007:00 pm.
Instructor: Panayiotis Tsaparas (tsap _at_ cs.uoi.gr), Office Β.3
Past Courses: Spring 2012, Fall 2012, Fall 2013, Fall 2013, Fall 2014, Fall 2015, Spring 2017
Grades: The grade for the course will be
determined by the assignments and project. There will be no final exam either
on January or September exam period.
Logistics: The slides with the logistics for the class (pdf
)
·
Thursday 1/2: Question 4 of Assignment 3: In this question you are asked to
compare a new recommendation algorithm with the algorithms you implemented in
Assignment 2, on the data you created in Assignment 2. For this question you
can improve upon your solution in Assignment 2 and correct some of the
mistakes you did in the data generation. Here are some common error in this
question:
o
Iterative
pruning: The question asked for iterative pruning of the users and businesses
until, in the data you created, all users had rated at least 10 businesses,
and all businesses were rated by at least 10 users. Each time you prune a
user or a business, the number of ratings received by the businesses or given
by the users changes. You need to do the pruning
iteratively, until no further pruning is possible.
o
Data Structures: Some
students complained that their program was too slow or consumed too much
memory. For the latter, you should load in memory on the data from Toronto.
The filtering step should happen while you read the data. You should never
load all the lines of the file in memory. For the former, you should use the
appropriate data structures. Using lists is very slow if you want to check
whether a user or a business is in the data. The most reasonable data
structure is a dictionary, which will have as values dictionaries. You may
need more than one such dictionaries, one for users
and one for businesses.
o
Sampling: You
should sample ratings, not users or businesses.
o
Similarity: For the
computation of similarity, you should subtract the mean of the nonzero
entries (of a line or a column) from the nonzero entries, and then take the
cosine similarity. Don’t take the mean of the full line or column, as this
contains a lot of zeros.
o
Neighbors: When you
take the K nearest neighbors of a user, you should take the K nearest
neighbors that have also rated the business in question (or, for a business,
the K nearest businesses that have been rated by the user in question).
Otherwise you may include a lot of zero values.
o
SVD:
Experiment with many values for K for SVD, and consider also larger values
(e.g., K = 100)
· Sunday
24/12. Third Assignment: The third assignment is available on the Assignments page of the
course.
· Sunday
26/11. Second Assignment: The second assignment is available on the Assignments page of the
course.
· Friday
15/11. First Assignment: The first assignment is available on the Assignments page of the
course.
· Tuesday
26/9. Welcome to Data Mining 2017!
