Home
Announcements
Homework
Reading
List
References
Datasets
and Code
Interesting
Links

Homework
Homework 1
The first assignment is a programming assignment for
performing measurements on networks. You can download the assignment
sheet from here. You can download
the
sample matlab file here.
Homework 2
The second assigment is a reaction paper. You can download the
assignment sheet here.
Homework
3
The
third assigment is a presentation. You will have to
select a paper and present it in 20
minutes. You should submit a copy of your presentation in advance, so
that we can print a copy of it. For the presentation, you can
select any of the papers below, or propose a paper yourself (you should
send me an email to discuss about it). Papers will be handed out in
first come first serve fashion.
 Deepayan Chakrabarti, Yiping Zhan and Christos Faloutsos
RMAT: A Recursive Model for Graph Mining
SIAM Int. Conf. on Data Mining, April 2004.
 V. Nguyen and C. Martel. Analyzing
and characterizing smallworld
graphs. To appear in the 2005 ACMSIAM symposium on Discrete
Algorithms.
 C. Martel and V. Nguyen. Analyzing
Kleinberg's (and other) smallworld
models. In the 23rd ACM Symposium on Principles of Distributed
Computing,
pp. 179188, 2004.
 M. Naor, U. Weider Know thy
Neighbor’s Neighbor: Better Routing for SkipGraphs and Small Wolds
 P. Fraigniaud, C. Gavoille, and C. Paul, Eclecticism
Shrinks Even Small Worlds, 23rd ACM Symp. on Principles of
Distributed Computing
(PODC 2004).
 D.M. Pennock, G.W. Flake, S. Lawrence, E. J. Glover, C. L.
Giles, Winners
don't take all: Characterizing the competition for links on the web,
Proc. of the National Academy of Sciences, 2002
 Micah Adler, Michael
Mitzenmacher, Towards
Compressing Web Graphs in Proceedings of the IEEE Data
Compression
Conference (DCC), 2001.
 M. Diligenti, F.M. Coetzee,
S. Lawrence, C.L. Giles, M. Gori Focused
Crawling Using Context Graphs.
26th International Conference on Very Large Databases, VLDB 2000.
 A. Y. Ng, A. X. Zheng, and
M. I. Jordan. Stable
algorithms for link analysis.
24th International Conference on Research and Development in
Information Retrieval (SIGIR 2001).
 Taher H. Haveliwala. TopicSensitive
PageRank.
11th International World Wide Web Conference, 2002.
 N. Eiron, K. S. McCurley, J. A. Tomlin, Ranking the
Web Frontier, WWW 2004
 R. Guha, R. Kumar, P. Raghavan, A. Tomkins, Propagation
of Trust and Distrust WWW
2004
 R. B. Almeida, V. A. F. Almeida, A
CommunityAware Search Engine WWW
2004
 W.
Xi, B. Zhang, Z.
Chen, Y. Lu, S.
Yan, W.Y. Ma, E. A. Fox, Link Fusion:
A Unified Link Analysis Framework for MultiType Interrelated Data
Objects WWW 2004
 R.
Lempel, A. Soffer. PicASHOW:
Pictorial Authority Search by Hyperlinks on the Web.
10th International World Wide Web Conference, May 2001
 Ron Fagin,
Ravi Kumar,
Mohammad Mahdian, D. Sivakumar, Searching
the workplace web, WWW 2003
 P. Ferragina, A. Gulli, F. Romani, Ranking
a Stream of
News, WWW2005
 Thorsten Joachims, Optimizing
earch engines using click through data, KDD 2005
 Ron Fagin,
Ravi Kumar and D. Sivakumar Comparing
top k
lists, Extended abstract in 2003
ACMSIAM Symposium on Discrete Algorithms (SODA '03),
pp. 2836
 David
LibenNowell, Hari
Balakrishnan, and David Karger. Analysis
of the Evolution of PeertoPeer Networks. In Proceedings
of PODC 2002.
 E. Cohen, S.
Shenker. Replication
Strategies in Unstructured PeertoPeer Networks.
SIGCOMM 2002.
 A.
Goel, H. Zhang, and R. Govindan.
Using the SmallWorld Model to Improve Freenet Performance.
IEEE Infocom, 2002.
 Daniel Gruhl,
R. Guha, David LibenNowell, and
Andrew Tomkins. Information Diffusion through Blogspace (short version,
long
version). Proceedings of WWW'04.
 C. R. Myers,
"Software systems as complex networks: structure, function, and
evolvability of software collaboration graphs", Phys. Rev. E
68, 046116 (2003).
 D.
LibenNowell, J. Kleinberg. The Link
Prediction Problem for Social Networks.
Proc. 12th International Conference on Information and Knowledge
Management (CIKM), 2003.
 Christos Faloutsos, Kevin McCurley and Andrew Tomkins, Fast Discovery
of 'Connection Subgraphs' KDD 2004, Seattle, WA, Aug. 2004.
 Ravi Kumar, Uma Mahadevan, Sivakumar: A graphtheoretic approach
to extract storylines from search results. KDD 2004: 216225
 James
Aspnes, Kevin Chang, and Aleksandr Yampolskiy, Innoculation
strategies for victims of viruses and the sum of squares problem,
SODA 2005
 R.
Kumar, J. Novak, P. Raghavan, A. Tomkins, On
the bursty evolution of Blog Space, WWW 2003
 J. Hopcroft, O. Khan,
B. Kulis, and B. Selman.
Natural communities in large linked networks.
In Proceedings of the 9th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pages 541546
 N. Przulj, D. G.
Corneil, and I. Jurisica, Efficient
estimation of graphlet frequency distributions in proteinprotein
interaction networks,
Bioinformatics, Advance Access published on February 1, 2006;
doi:10.1093/bioinformatics/btl030
 .N. Przulj, D. G.
Corneil, and
I. Jurisica, Modeling
Interactome: ScaleFree or Geometric?,
Bioinformatics, volume 20, number 18, pages 35083515, 2004.
If you have questions, or you get stuck with your presentation, feel
free to email me, or Evimaria, to discuss
about it.
Project
Here
are some possible topics for experimental projects.
 Crawl
the site http://www.cs.helsinki.fi and generate the underlying graph
(that is, the graph of the web pages that are within this domain, and
the links between them  no outside pages, or links). Study the
underlying graph by making standard measurements. Code for fetching
pages and extracting links is provided in Perl.
 Study
the structure of the wikipedia graph.
 Study
the structure of the reality mining dataset (data on the behavior of
mobile phone users, available upon request).
 Using
the datasets on web search data, and the results of various Link
Analysis Algorithms, experiment with different rankaggregation
algorithms.
 Using
the datasets on web search data, experiment with various ranking
algorithms that make use of the link structure. You could implement
various Link Analysis Ranking algorithms, as well try heuristics that
have not been applied to the Web.
 Experiment
with the idea of combining clustering and ranking. This will require to
implement a clustering algorithm, and on top of that some ranking
algorithm. For the clustering algorithm you can implement one of the
algorithms described in class, or the one in the paper:
 Simulate
a Distributed Hashing Protocol (e.g. the Freenet, or the Symphony P2P
network).
 Experiment
with site percolation,virus propagation, immunization on the datasets
of the course
Other
experimental projects would include implementation of an algorithm in
some of the presented papers, or some of the papers in the reading list.
You can
also propose your own ideas for projects. Please contact me to discuss
further about your project. Survey type of projects are also possible,
but they should be more than just a summary of a few papers. You should
propose some future direction, and I would strongly recommend that you
provide some indicative experiments. Possible survey topics include,
small world phenomena in networks (includes a lot of theoretical
papers), influence of search engines on ranking, biological networks
(will require some understanding of biology).
Description
The
homework
for the course will mix exercise sets, reaction papers, presentation
and project. It will be a little heavier than last year and more
close tho the homework in Kleinberg's
course (click here for a
description of the coursework for this course). Below are the
different types of homework.
Any problem sets will be posted here.
Reaction Paper
For
the reaction paper you should select at least two papers relevant to
some section of the course, that have not been already discussed, and
write a report of approximately 3 pages, where you should
include the following
 Summary of the main contribution and technical content of
the papers, and how they relate to each other.
 Discussion on how the papers relate to the topics of the
course.
 Discussion about the shortcomings of the papers, and
possible future directions.
The reaction paper shoud not
be
just a summary of papers. The objective is to do a synthesis of ideas,
find some interesting related work that was not covered, think about a
problem, and about possible interesting questions.
Exercise Set
Depending on the material covered in class there may be an
exercise set related to the papers covered. The exercises will be
mostly mathematical, but there may also be some programming exercises.
Presentation
Depending
on the number of people that take the class, instead of a reaction
paper, there may be a presentation. In this case you will be asked to
select a paper and present it in 20 minutes.
Project
For
the project you are asked to select your favorite network and study it.
You can do the following type of projects:
 An experimental analysis of the network with respect to
measurements, algorithms or models. You can use one of the datasets in
the course home page or some dataset available on the Web, or you can
collect your own data. If substantial work is required for collecting
the dataset this will be counted towards the final grade.
 A rigorous theoretical analysis of an algorithm or model
related to the network.
 An indepth survey that makes a critical analysis of a
topic and offers a new perspective.
You are
expected
to produce a report with your findings of about 1015 pages. You
should start thinking about the project early on, and you can discuss
it with me or Evimaria if you need direction. Large projects could be
split between two people. I would recommend that you do a type 1
project, unless you are strong theoretically, or you feel you can offer
a new perspective to the analysis of a network. A type 3 project should
not be an extended reaction paper, and it will be graded with respect
to the new ideas it generates.
