Models and Algorithms for Complex Networks

Models and Algorithms for Complex Networks
Homework

Home

Announcements

Homework

Reading List

References

Datasets and Code

Interesting Links

Homework

Homework 1

The first assignment is a programming assignment for performing measurements on networks. You can download the assignment sheet from here. You can download the sample matlab file here.

Homework 2

The second assigment is a reaction paper. You can download the assignment sheet here.

Homework 3

The third assigment is a presentation. You will have to select a paper and present it in 20 minutes. You should submit a copy of your presentation in advance, so that we can print a copy of it. For the presentation, you can select any of the papers below, or propose a paper yourself (you should send me an email to discuss about it). Papers will be handed out in first come first serve fashion.

Deepayan Chakrabarti, Yiping Zhan and Christos Faloutsos R-MAT: A Recursive Model for Graph Mining SIAM Int. Conf. on Data Mining, April 2004.
V. Nguyen and C. Martel. Analyzing and characterizing small-world graphs. To appear in the 2005 ACM-SIAM symposium on Discrete Algorithms.
C. Martel and V. Nguyen. Analyzing Kleinberg's (and other) small-world models. In the 23rd ACM Symposium on Principles of Distributed Computing, pp. 179-188, 2004.
M. Naor, U. Weider Know thy Neighbor’s Neighbor: Better Routing for Skip-Graphs and Small Wolds
P. Fraigniaud, C. Gavoille, and C. Paul, Eclecticism Shrinks Even Small Worlds, 23rd ACM Symp. on Principles of Distributed Computing (PODC 2004).
D.M. Pennock, G.W. Flake, S. Lawrence, E. J. Glover, C. L. Giles, Winners don't take all: Characterizing the competition for links on the web, Proc. of the National Academy of Sciences, 2002
Micah Adler, Michael Mitzenmacher, Towards Compressing Web Graphs in Proceedings of the IEEE Data Compression Conference (DCC), 2001.
M. Diligenti, F.M. Coetzee, S. Lawrence, C.L. Giles, M. Gori Focused Crawling Using Context Graphs. 26th International Conference on Very Large Databases, VLDB 2000.
A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. 24th International Conference on Research and Development in Information Retrieval (SIGIR 2001).
Taher H. Haveliwala. Topic-Sensitive PageRank. 11th International World Wide Web Conference, 2002.
N. Eiron, K. S. McCurley, J. A. Tomlin, Ranking the Web Frontier, WWW 2004
R. Guha, R. Kumar, P. Raghavan, A. Tomkins, Propagation of Trust and Distrust WWW 2004
R. B. Almeida, V. A. F. Almeida, A Community-Aware Search Engine WWW 2004
W. Xi, B. Zhang, Z. Chen, Y. Lu, S. Yan, W.-Y. Ma, E. A. Fox, Link Fusion: A Unified Link Analysis Framework for Multi-Type Interrelated Data Objects WWW 2004
R. Lempel, A. Soffer. PicASHOW: Pictorial Authority Search by Hyperlinks on the Web. 10th International World Wide Web Conference, May 2001
Ron Fagin, Ravi Kumar, Mohammad Mahdian, D. Sivakumar, Searching the workplace web, WWW 2003
P. Ferragina, A. Gulli, F. Romani, Ranking a Stream of News, WWW2005
Thorsten Joachims, Optimizing earch engines using click through data, KDD 2005
Ron Fagin, Ravi Kumar and D. Sivakumar Comparing top k lists, Extended abstract in 2003 ACM-SIAM Symposium on Discrete Algorithms (SODA '03), pp. 28-36
David Liben-Nowell, Hari Balakrishnan, and David Karger. Analysis of the Evolution of Peer-to-Peer Networks. In Proceedings of PODC 2002.
E. Cohen, S. Shenker. Replication Strategies in Unstructured Peer-to-Peer Networks. SIGCOMM 2002.
A. Goel, H. Zhang, and R. Govindan. Using the Small-World Model to Improve Freenet Performance. IEEE Infocom, 2002.
Daniel Gruhl, R. Guha, David Liben-Nowell, and Andrew Tomkins. Information Diffusion through Blogspace (short version, long version). Proceedings of WWW'04.
C. R. Myers, "Software systems as complex networks: structure, function, and evolvability of software collaboration graphs", Phys. Rev. E 68, 046116 (2003).
D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. Proc. 12th International Conference on Information and Knowledge Management (CIKM), 2003.
Christos Faloutsos, Kevin McCurley and Andrew Tomkins, Fast Discovery of 'Connection Subgraphs' KDD 2004, Seattle, WA, Aug. 2004.
Ravi Kumar, Uma Mahadevan, Sivakumar: A graph-theoretic approach to extract storylines from search results. KDD 2004: 216-225
James Aspnes, Kevin Chang, and Aleksandr Yampolskiy, Innoculation strategies for victims of viruses and the sum of squares problem, SODA 2005
R. Kumar, J. Novak, P. Raghavan, A. Tomkins, On the bursty evolution of Blog Space, WWW 2003
J. Hopcroft, O. Khan, B. Kulis, and B. Selman. Natural communities in large linked networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 541--546
N. Przulj, D. G. Corneil, and I. Jurisica, Efficient estimation of graphlet frequency distributions in protein-protein interaction networks, Bioinformatics, Advance Access published on February 1, 2006; doi:10.1093/bioinformatics/btl030
.N. Przulj, D. G. Corneil, and I. Jurisica, Modeling Interactome: Scale-Free or Geometric?, Bioinformatics, volume 20, number 18, pages 3508-3515, 2004.

If you have questions, or you get stuck with your presentation, feel free to email me, or Evimaria, to discuss about it.

Project

Here are some possible topics for experimental projects.

Crawl the site http://www.cs.helsinki.fi and generate the underlying graph (that is, the graph of the web pages that are within this domain, and the links between them -- no outside pages, or links). Study the underlying graph by making standard measurements. Code for fetching pages and extracting links is provided in Perl.
Study the structure of the wikipedia graph.
Study the structure of the reality mining dataset (data on the behavior of mobile phone users, available upon request).
Using the datasets on web search data, and the results of various Link Analysis Algorithms, experiment with different rank-aggregation algorithms.
Using the datasets on web search data, experiment with various ranking algorithms that make use of the link structure. You could implement various Link Analysis Ranking algorithms, as well try heuristics that have not been applied to the Web.
Experiment with the idea of combining clustering and ranking. This will require to implement a clustering algorithm, and on top of that some ranking algorithm. For the clustering algorithm you can implement one of the algorithms described in class, or the one in the paper:

D. Gibson, R. Kumar, A. Tomkins, Discovering dense subgraphs in massive datasets, VLDB 2005

Simulate a Distributed Hashing Protocol (e.g. the Freenet, or the Symphony P2P network).
Experiment with site percolation,virus propagation, immunization on the datasets of the course

Other experimental projects would include implementation of an algorithm in some of the presented papers, or some of the papers in the reading list.

You can also propose your own ideas for projects. Please contact me to discuss further about your project. Survey type of projects are also possible, but they should be more than just a summary of a few papers. You should propose some future direction, and I would strongly recommend that you provide some indicative experiments. Possible survey topics include, small world phenomena in networks (includes a lot of theoretical papers), influence of search engines on ranking, biological networks (will require some understanding of biology).

Description

The homework for the course will mix exercise sets, reaction papers, presentation and project. It will be a little heavier than last year and more close tho the homework in Kleinberg's course (click here for a description of the coursework for this course). Below are the different types of homework.

Any problem sets will be posted here.

Reaction Paper

For the reaction paper you should select at least two papers relevant to some section of the course, that have not been already discussed, and write a report of approximately 3 pages, where you should include the following

Summary of the main contribution and technical content of the papers, and how they relate to each other.
Discussion on how the papers relate to the topics of the course.
Discussion about the shortcomings of the papers, and possible future directions.

The reaction paper shoud not be just a summary of papers. The objective is to do a synthesis of ideas, find some interesting related work that was not covered, think about a problem, and about possible interesting questions.

Exercise Set

Depending on the material covered in class there may be an exercise set related to the papers covered. The exercises will be mostly mathematical, but there may also be some programming exercises.

Presentation

Depending on the number of people that take the class, instead of a reaction paper, there may be a presentation. In this case you will be asked to select a paper and present it in 20 minutes.

Project

For the project you are asked to select your favorite network and study it. You can do the following type of projects:

An experimental analysis of the network with respect to measurements, algorithms or models. You can use one of the datasets in the course home page or some dataset available on the Web, or you can collect your own data. If substantial work is required for collecting the dataset this will be counted towards the final grade.
A rigorous theoretical analysis of an algorithm or model related to the network.
An in-depth survey that makes a critical analysis of a topic and offers a new perspective.

You are expected to produce a report with your findings of about 10-15 pages. You should start thinking about the project early on, and you can discuss it with me or Evimaria if you need direction. Large projects could be split between two people. I would recommend that you do a type 1 project, unless you are strong theoretically, or you feel you can offer a new perspective to the analysis of a network. A type 3 project should not be an extended reaction paper, and it will be graded with respect to the new ideas it generates.