Home
Announcements
Homework
Reading
List
References
Datasets
and Code
Interesting
Links

Homework
Homework
1
Homerwork 1 is a problem set. You can download it from here.
The due date is April 14th in class. On Monday, April 4th
there will be
a tutorial about the first assignment.
Homework 2
Homework 2 is
a
presentation. You will have to select a paper and
present it in 20
minutes. The presentations will take place in the
last week of the
course, but you are required to submit your slides
(the presentation
file, or copy of the slides) on May 3rd. For the
presentation, you can
select any of the papers below, or propose a paper
yourself (you should
send me an email to discuss about it). Papers will
be handed out in
first come first serve fashion.
(If you are
interested in doing
a presentation on P2P, epidemic spreads in
networks, or biological
networks let me know. I did not have the time to
add links for these
topics)
 Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani,
Bernardo
A. Huberman. Search
in
PowerLaw Networks.
Phys. Rev. E, 64 46135 (2001).
 Michalis Faloutsos, Petros Faloutsos and Christos
Faloutsos.
On PowerLaw Relationships of the Internet Topology.
ACM SIGCOMM 1999.
 Deepayan Chakrabarti, Yiping Zhan and Christos
Faloutsos
RMAT: A Recursive Model for Graph Mining
SIAM Int. Conf. on Data Mining, April 2004.
 V. Nguyen and C. Martel. Analyzing
and
characterizing smallworld
graphs. To appear in the 2005 ACMSIAM symposium
on Discrete
Algorithms.
 D. J. Watts, P. S. Dodds, and
M. E. J. Newman, Identity
and
search in social networks, Science
296,
13021305 (2002). (Hannu Maksimainen)
 C. Martel and V. Nguyen. Analyzing
Kleinberg's
(and other) smallworld
models. In the 23rd ACM Symposium on Principles of
Distributed
Computing,
pp. 179188, 2004.
 M. Naor, U. Weider Know
thy
Neighbor’s Neighbor: Better Routing for SkipGraphs
and Small Wolds
 P. Fraigniaud, C. Gavoille, and C. Paul, Eclecticism
Shrinks
Even Small Worlds, 23rd ACM Symp. on Principles of
Distributed Computing
(PODC 2004).
 D.M. Pennock, G.W. Flake, S. Lawrence, E. J. Glover,
C. L.
Giles, Winners
don't
take all: Characterizing the competition for links on
the web,
Proc. of the National Academy of Sciences, 2002
 Z. BarYossef, A. Broder, R. Kumar and A. Tomkins.
Sic
Transit Gloria
Telae: Towards an Understanding of theWeb's Decay.
In
Proceedings of the Thirteenth International World Wide
Web Conference,
New York, New York, 2004. (Sini Ruohomaa)
 Micah Adler, Michael
Mitzenmacher, Towards
Compressing
Web Graphs in Proceedings of the IEEE
Data
Compression
Conference (DCC), 2001.
 M. Henzinger, A.
Heydon, M.
Mitzenmacher, and M. Najork. Measuring
Index
Quality using Random Walks on the Web.
8th International World Wide Web Conference, May
1999.
 M. Diligenti, F.M.
Coetzee,
S. Lawrence, C.L. Giles, M. Gori Focused
Crawling
Using Context Graphs.
26th International Conference on Very Large
Databases, VLDB 2000.
 A. Y. Ng, A. X. Zheng,
and
M. I. Jordan. Stable
algorithms
for link analysis.
24th International Conference on Research and
Development in
Information Retrieval (SIGIR 2001).
 Taher H. Haveliwala. TopicSensitive
PageRank.
11th International World Wide Web Conference, 2002.
 J. Cho, S. Roy,
Impact
Of
Search Engines On Page Popularity, WWW 2004 (Papp Zoltan)
 N. Eiron, K. S.
McCurley, J. A. Tomlin, Ranking
the
Web Frontier, WWW 2004
 R. Guha, R. Kumar, P. Raghavan, A. Tomkins, Propagation
of
Trust and Distrust WWW
2004
 R. B. Almeida, V. A. F.
Almeida, A
CommunityAware Search Engine WWW
2004
 W.
Xi, B.
Zhang, Z.
Chen, Y. Lu, S.
Yan, W.Y.
Ma, E. A. Fox, Link
Fusion:
A Unified Link Analysis Framework for MultiType
Interrelated Data
Objects WWW 2004
 K. Berberich, M.
Vazirgiannis, G. Weikum, TRank:
Time
Aware Authority Ranking, WAW 2004 (Markus
Aukeala)
 Ron
Fagin,
Ravi Kumar,
Mohammad Mahdian, D. Sivakumar, Searching
the
workplace web, WWW 2003 (Edit Bodorkos)
 P. Ferragina, A. Gulli, F. Romani, Ranking
a
Stream of
News, To appear in WWW2005
 Ron
Fagin,
Ravi Kumar and D. Sivakumar Comparing
top
k
lists, Extended abstract in 2003
ACMSIAM Symposium on Discrete Algorithms (SODA '03),
pp. 2836
 David
LibenNowell, Hari
Balakrishnan, and David Karger. Analysis
of
the Evolution of PeertoPeer Networks. In
Proceedings
of PODC 2002.
 E.
Cohen, S.
Shenker. Replication
Strategies
in Unstructured PeertoPeer Networks.
SIGCOMM 2002.
 A.
Goel, H. Zhang, and R. Govindan.
Using the SmallWorld Model to Improve Freenet
Performance.
IEEE Infocom, 2002.
 A. Rowstron, P. Druschel. Pastry:
Scalable,
distributed object location and routing
for largescale peertopeer systems.
18th IFIP/ACM International Conference on Distributed
Systems
Platforms (Middleware 2001).
 B. Y. Zhao, J. D. Kubiatowicz, A. D. Joseph, Tapestry:
An
Infrastructure for FaultTolerant WideArea
Location and Routing.
UC Berkeley Computer Science Division, Report No.
UCB/CSD 01/1141,
April 2001.
 Daniel
Gruhl,
R. Guha, David LibenNowell, and
Andrew Tomkins. Information Diffusion through Blogspace
(short
version, long
version). Proceedings of WWW'04.
 C. R. Myers,
"Software systems as complex networks: structure,
function, and
evolvability of software collaboration graphs", Phys.
Rev. E
68, 046116 (2003).
 D.
LibenNowell, J. Kleinberg. The
Link
Prediction Problem for Social Networks.
Proc. 12th International Conference on Information and
Knowledge
Management (CIKM), 2003.
 Christos Faloutsos, Kevin McCurley and Andrew
Tomkins,
Fast Discovery
of 'Connection Subgraphs' KDD 2004, Seattle, WA,
Aug. 2004.
 Jasmine Novak, Prabhakar Raghavan, Andrew
Tomkins. Antialiasing
on the web. WWW
2004: 3039 (Pekka
Maksimainen)
 Ravi Kumar, Uma Mahadevan, Sivakumar: A
graphtheoretic approach
to extract storylines from search results. KDD 2004: 216225
If you have questions, or you get stuck with your
presentation or
project, feel free to email me, and we can arrange
to meet and discuss
about it.
Project
Here
are
some possible topics for experimental projects.
 Crawl
the
site http://www.cs.helsinki.fi and generate the
underlying graph
(that is, the graph of the web pages that are within
this domain, and
the links between them  no outside pages, or links).
Study the
underlying graph by making standard measurements. Code
for fetching
pages and extracting links is provided in Perl. This
project could be
done by more than one person.
 Study
the
structure of the wikipedia graph
 Using
the
datasets on web search data, and the results of
various Link
Analysis Algorithms, experiment with different
rankaggregation
algorithms.
 Using
the
datasets on web search data, experiment with various
ranking
algorithms that make use of the link structure. You
could implement
various Link Analysis Ranking algorithms, as well try
heuristics that
have not been applied to the Web.
 Experiment
with
the idea of combining clustering and ranking. One
specific project
is to implement the ideas in (or
ideas along the
lines of) the paper,
 Simulate
a
Distributed Hashing Protocol (e.g. the Freenet, or the
Symphony P2P
network).
 Experiment
with
site percolation,virus propagation, immunization on
the datasets
of the course
You
can
also propose your own ideas for projects. Please contact
me to discuss
further about your project. Survey type of projects are
also possible,
but they should be more than just a summary of a few
papers. You should
propose some future direction, and I would strongly
recommend that you
provide some indicative experiments.
Description
The
homework
for the course will be tailored again along the
lines of Kleinberg's
course (click here
for a
description of the coursework for this
course). Below are the
different types of homework.
Any problem sets will be posted here.
Reaction Paper
For
the reaction paper you should select at least two papers
relevant to
some section of the course, that have not been already
discussed, and
write a report of approximately 3 pages, where you
should
include the following
 Summary of the main contribution and technical
content of
the papers, and how they relate to each other.
 Discussion on how the papers relate to the topics of
the
course.
 Discussion about the shortcomings of the papers, and
possible future directions.
The reaction paper should not
be
just a summary of papers. The objective is to do a
synthesis of ideas,
find some interesting related work that was not covered,
think about a
problem, and about possible interesting questions.
Exercise Set
Depending on the material covered in class
there may be an
exercise set related to the papers covered. The
exercises will be
mathematical.
Presentation
Depending
on the number of people that take the class, instead of a
reaction
paper, there may be a presentation. In this case you will
be asked to
select a paper and present it in 20 minutes.
Project
For
the project you are asked to select your favorite network
and study it.
You can do the following type of projects:
 An experimental analysis of the network with respect
to
measurements, algorithms or models. You can use one of
the datasets in
the course home page or some dataset available on the
Web, or you can
collect your own data. If substantial work is required
for collecting
the dataset this will be counted towards the final
grade.
 A rigorous theoretical analysis of an algorithm or
model
related to the network.
 An indepth survey that makes a critical analysis of a
topic and offers a new perspective.
You
are
expected
to produce a report with your findings of about 1015
pages. You
should start thinking about the project early on, and you
can discuss
it with me or Evimaria if you need direction. Large
projects could be
split between two people. I would recommend that you do a
type 1
project, unless you are strong theoretically, or you feel
you can offer
a new perspective to the analysis of a network. A type 3
project should
not be an extended reaction paper, and it will be graded
with respect
to the new ideas it generates.
