Online Social Networks and Media
You can download the list of project topics here. Let us know of the topic you pick, and email or set up a meeing to discuss the details. If you have your own project proposal please contact us with your suggestion.
The timeline for the projects is as follows:
The assignment is due on Wednesday 19/12 in class. You should do the assignment individually. The goal of the assignment is to understand the HITS algorithm and the propagation in networks. You should do the following exercises from the textbook:
· Exercise 6 from Chapter 14.
· Exercise 4 from Chapter 19.
· Exercise 2 from Chapter 21.
The assignment is due on Wednesday 28/11 in class. You should do the assignment individually. It has two parts
a. The goal of the first part is to understand better the betweeness computation. You should do Questions 10.2.1 -- 10.2.3 from Chapter 10 of textbook Mining Massive Datasets. You should hand in your answers on paper at the beginning of class. There will be no presentation, but we may sample randomly a few people to present their solutions.
b. The goal of the second part is to understand the proof that decentralized greedy search is efficient for small world networks when the probability of long-range links is selected appropriately. You will consider the case that the underlying structure is a grid, and each node on the grid has a long range link. The endopoint of the link is selected with probability inversely proportional to the square of the distance from starting point of the link. Prove that in this case the expected number of steps of a path found between two nodes s,t by the greedy algorithm is O(log2n).
The assignment is due on Wednesday 14/11 in class. You can do the assignment in groups of up to 2 people. The goal of the assignment is to implement and experiment with different network models.
Choose either a preferential attachment or a small world model to implement. You should do your own implementation, and not use some existing one online.
For the preferential attachment model you can choose between the Barabasi-Albert model and the Copying model. The two models are described in the survey of Mark Newman
· M. E. J. Newman, The structure and function of complex networks, SIAM Reviews, 45(2): 167-256, 2003
You can also follow references from the review to the original papers where the two models are described.
For the small-world model, you can choose one of the two models proposed by Duncan Watts. The first one is described in the paper:
· D.J. Watts. Networks, Dynamics and Small-World Phenomenon, American Journal of Sociology, Vol. 105, Number 2, 493-527, 1999
The second one (proposed in collaboration with S. H. Strogatz) is described in the paper
· Watts, D. J. and S. H. Strogatz. Nature 393:440-42, 1998
It is also described in the review by Mark Newman.
Implement the model that you choose and experiment with different values for the parameters of the model. Generate graphs with at least 5K nodes. For each graph, compute the clustering coefficient, the degree distribution, and produce a visualization of the network. You can use Gephi for these tasks. For the scale free models compute also the exponent of the distribution.
You should prepare a short presentation (4-5 slides) for the above. The presentation should include a 1-page description of the model and the choices that you made.
The assignment is due on Friday 26/10 in class. You can do the assignment in groups of up to 3 people. The goal of the assignment is to apply the techniques we described in class for observing and computing power-laws.
Choose a graph dataset (you can find a list of such datasets at the Resources page). The graph should have at least 1000 nodes.
1. Compute the degree distribution and produce the following plots:
i. The degree distribution in log-log scale
ii. The degree distribution in log-log scale using exponential binning
iii. The cumulative degree distribution in log-log scale
iv. The Zipf plot of the degree distribution
2. Compute the exponent of the power-law distribution and in the plots i-iii plot also the expected value if the values were drawn from the distribution.
You can read more about the different methods of plotting the distribution and estimating the power-law exponent at:
You should prepare a short presentation (3-4 slides) for the above. The presentation should include a description of the dataset.