The first assignment is a programming assignment for
performing measurements on networks. You can download the assignment
sheet from here. You can download
sample matlab file here.
The second assigment is a reaction paper. You can download the
assignment sheet here.
third assigment is a presentation. You will have to
select a paper and present it in 20
minutes. You should submit a copy of your presentation in advance, so
that we can print a copy of it. For the presentation, you can
select any of the papers below, or propose a paper yourself (you should
send me an email to discuss about it). Papers will be handed out in
first come first serve fashion.
- Deepayan Chakrabarti, Yiping Zhan and Christos Faloutsos
R-MAT: A Recursive Model for Graph Mining
SIAM Int. Conf. on Data Mining, April 2004.
- V. Nguyen and C. Martel. Analyzing
and characterizing small-world
graphs. To appear in the 2005 ACM-SIAM symposium on Discrete
- C. Martel and V. Nguyen. Analyzing
Kleinberg's (and other) small-world
models. In the 23rd ACM Symposium on Principles of Distributed
pp. 179-188, 2004.
- M. Naor, U. Weider Know thy
Neighbor’s Neighbor: Better Routing for Skip-Graphs and Small Wolds
- P. Fraigniaud, C. Gavoille, and C. Paul, Eclecticism
Shrinks Even Small Worlds, 23rd ACM Symp. on Principles of
- D.M. Pennock, G.W. Flake, S. Lawrence, E. J. Glover, C. L.
don't take all: Characterizing the competition for links on the web,
Proc. of the National Academy of Sciences, 2002
- Micah Adler, Michael
Compressing Web Graphs in Proceedings of the IEEE Data
Conference (DCC), 2001.
- M. Diligenti, F.M. Coetzee,
S. Lawrence, C.L. Giles, M. Gori Focused
Crawling Using Context Graphs.
26th International Conference on Very Large Databases, VLDB 2000.
- A. Y. Ng, A. X. Zheng, and
M. I. Jordan. Stable
algorithms for link analysis.
24th International Conference on Research and Development in
Information Retrieval (SIGIR 2001).
- Taher H. Haveliwala. Topic-Sensitive
11th International World Wide Web Conference, 2002.
- N. Eiron, K. S. McCurley, J. A. Tomlin, Ranking the
Web Frontier, WWW 2004
- R. Guha, R. Kumar, P. Raghavan, A. Tomkins, Propagation
of Trust and Distrust WWW
- R. B. Almeida, V. A. F. Almeida, A
Community-Aware Search Engine WWW
Xi, B. Zhang, Z.
Chen, Y. Lu, S.
Yan, W.-Y. Ma, E. A. Fox, Link Fusion:
A Unified Link Analysis Framework for Multi-Type Interrelated Data
Objects WWW 2004
Lempel, A. Soffer. PicASHOW:
Pictorial Authority Search by Hyperlinks on the Web.
10th International World Wide Web Conference, May 2001
- Ron Fagin,
Mohammad Mahdian, D. Sivakumar, Searching
the workplace web, WWW 2003
- P. Ferragina, A. Gulli, F. Romani, Ranking
a Stream of
- Thorsten Joachims, Optimizing
earch engines using click through data, KDD 2005
- Ron Fagin,
Ravi Kumar and D. Sivakumar Comparing
lists, Extended abstract in 2003
ACM-SIAM Symposium on Discrete Algorithms (SODA '03),
Balakrishnan, and David Karger. Analysis
of the Evolution of Peer-to-Peer Networks. In Proceedings
of PODC 2002.
- E. Cohen, S.
Strategies in Unstructured Peer-to-Peer Networks.
Goel, H. Zhang, and R. Govindan.
Using the Small-World Model to Improve Freenet Performance.
IEEE Infocom, 2002.
- Daniel Gruhl,
R. Guha, David Liben-Nowell, and
Andrew Tomkins. Information Diffusion through Blogspace (short version,
version). Proceedings of WWW'04.
- C. R. Myers,
"Software systems as complex networks: structure, function, and
evolvability of software collaboration graphs", Phys. Rev. E
68, 046116 (2003).
Liben-Nowell, J. Kleinberg. The Link
Prediction Problem for Social Networks.
Proc. 12th International Conference on Information and Knowledge
Management (CIKM), 2003.
- Christos Faloutsos, Kevin McCurley and Andrew Tomkins, Fast Discovery
of 'Connection Subgraphs' KDD 2004, Seattle, WA, Aug. 2004.
- Ravi Kumar, Uma Mahadevan, Sivakumar: A graph-theoretic approach
to extract storylines from search results. KDD 2004: 216-225
Aspnes, Kevin Chang, and Aleksandr Yampolskiy, Innoculation
strategies for victims of viruses and the sum of squares problem,
Kumar, J. Novak, P. Raghavan, A. Tomkins, On
the bursty evolution of Blog Space, WWW 2003
- J. Hopcroft, O. Khan,
B. Kulis, and B. Selman.
Natural communities in large linked networks.
In Proceedings of the 9th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages 541--546
- N. Przulj, D. G.
Corneil, and I. Jurisica, Efficient
estimation of graphlet frequency distributions in protein-protein
Bioinformatics, Advance Access published on February 1, 2006;
- .N. Przulj, D. G.
I. Jurisica, Modeling
Interactome: Scale-Free or Geometric?,
Bioinformatics, volume 20, number 18, pages 3508-3515, 2004.
If you have questions, or you get stuck with your presentation, feel
free to email me, or Evimaria, to discuss
are some possible topics for experimental projects.
experimental projects would include implementation of an algorithm in
some of the presented papers, or some of the papers in the reading list.
the site http://www.cs.helsinki.fi and generate the underlying graph
(that is, the graph of the web pages that are within this domain, and
the links between them -- no outside pages, or links). Study the
underlying graph by making standard measurements. Code for fetching
pages and extracting links is provided in Perl.
the structure of the wikipedia graph.
the structure of the reality mining dataset (data on the behavior of
mobile phone users, available upon request).
the datasets on web search data, and the results of various Link
Analysis Algorithms, experiment with different rank-aggregation
the datasets on web search data, experiment with various ranking
algorithms that make use of the link structure. You could implement
various Link Analysis Ranking algorithms, as well try heuristics that
have not been applied to the Web.
with the idea of combining clustering and ranking. This will require to
implement a clustering algorithm, and on top of that some ranking
algorithm. For the clustering algorithm you can implement one of the
algorithms described in class, or the one in the paper:
a Distributed Hashing Protocol (e.g. the Freenet, or the Symphony P2P
with site percolation,virus propagation, immunization on the datasets
of the course
also propose your own ideas for projects. Please contact me to discuss
further about your project. Survey type of projects are also possible,
but they should be more than just a summary of a few papers. You should
propose some future direction, and I would strongly recommend that you
provide some indicative experiments. Possible survey topics include,
small world phenomena in networks (includes a lot of theoretical
papers), influence of search engines on ranking, biological networks
(will require some understanding of biology).
for the course will mix exercise sets, reaction papers, presentation
and project. It will be a little heavier than last year and more
close tho the homework in Kleinberg's
course (click here for a
description of the coursework for this course). Below are the
different types of homework.
Any problem sets will be posted here.
the reaction paper you should select at least two papers relevant to
some section of the course, that have not been already discussed, and
write a report of approximately 3 pages, where you should
include the following
- Summary of the main contribution and technical content of
the papers, and how they relate to each other.
- Discussion on how the papers relate to the topics of the
- Discussion about the shortcomings of the papers, and
possible future directions.
The reaction paper shoud not
just a summary of papers. The objective is to do a synthesis of ideas,
find some interesting related work that was not covered, think about a
problem, and about possible interesting questions.
Depending on the material covered in class there may be an
exercise set related to the papers covered. The exercises will be
mostly mathematical, but there may also be some programming exercises.
on the number of people that take the class, instead of a reaction
paper, there may be a presentation. In this case you will be asked to
select a paper and present it in 20 minutes.
the project you are asked to select your favorite network and study it.
You can do the following type of projects:
to produce a report with your findings of about 10-15 pages. You
should start thinking about the project early on, and you can discuss
it with me or Evimaria if you need direction. Large projects could be
split between two people. I would recommend that you do a type 1
project, unless you are strong theoretically, or you feel you can offer
a new perspective to the analysis of a network. A type 3 project should
not be an extended reaction paper, and it will be graded with respect
to the new ideas it generates.
- An experimental analysis of the network with respect to
measurements, algorithms or models. You can use one of the datasets in
the course home page or some dataset available on the Web, or you can
collect your own data. If substantial work is required for collecting
the dataset this will be counted towards the final grade.
- A rigorous theoretical analysis of an algorithm or model
related to the network.
- An in-depth survey that makes a critical analysis of a
topic and offers a new perspective.