network

Online Social Networks and Media
Homework

 

Home

Homework


Slides & References

Reading Material

Resources

Project Report Guidelines

 

You can find some guidelines for the project report here. Make sure that you start the report early!

 

Paper Presentation Guidelines

 

The presentations will be evaluated based on the quality of the presentation, and the comprehension of the material covered. The following are some guideline, tips and advice for preparing your presentation.

        You have 20 minutes for the presentation (1 student group) and 25 minutes (2 students group). We will enforce the time limit and cut you off if you have not completed on time. 10 more minutes will be allocated for questions. We may randomly pick someone from the audience to ask a question, so everyone should pay attention.

        You should prepare around 20-25 slides, given that a slide takes around a minute to talk about on average.

        Break you presentation into thematic units. The following flow is very common:

1.      Motivate why the problem is important and give a high level idea;

2.      Define clearly the problem;

3.      Present the main idea and the fundamental algorithms;

4.      Present the results (experimental or theoretical or both);

5.      Conclusions.

        The talk should be self-contained. Do not assume that the audience has read the paper, or some previous work that you consider known. Define all the concepts you need and all the notation that you use. Refer only to related work that you know.

        Since the time for the talk is short, you will need to focus on the important parts of the paper and avoid going through all the details. The goal is to give a summary of the paper and have a clear message. Just because you read all the paper it does not mean that you should present everything. At the same time, you should not skip important information. Focusing on the right part to present is important since it shows that you understood the paper well.

        Prepare the slides carefully. Do not add too much text, and only the math symbols necessary. Do not use full sentences, but rather keywords and short phrases. Make sure the slides are readable and not too loaded. Never ever project parts of the paper pdf.

        Practice! Good talks are the result of a lot of practice even if they seem spontaneous and fun to the audience. Practice the talk several times, and time yourself to make sure you are within the time bounds.

 

Some fun advice on how to give a bad talk (and more) here.

 

 

Projects

 

The list of projects is available here. The assignment is First-Come-First-Serve. The projects will be done in teams of at most two students. Send an email to both instructors with the names of the team members, and your top-3 preferred projects. As soon as we receive your email we will assign you the highest ranked project that has not already been assign

 

Deliverables and Timeline:

 

Wednesday 13/4/2022

All teams should have send an email with their preferred project topic and have a topic assigned.

Friday, 15/4/2022

 

A one-page project proposal outlining what you plan to do. This should include the topic (and papers) of your presentation

Wednesday, 4/5/2022

A 15 presentation of the project proposal

Wednesday, 25/5/2022

Paper presentation

Wednesday, 15/6/2022

 

Submit GitHub page:

Source code of the project

Datasets used

Project report

 

Assignment 2

 

Due April 6 in class.

 

For this assignment you will create a Jupyter Notebook where you will experiment with Pagerank and community discovery algorithms. The notebook will contain the code you have written, the output of your code, and any commentary you have on your results.

 

You can either write your own code or use implementations provided by SNAP, NetworkX, or other sources. It is recommended to use existing libraries. Specify this in your report.


You will consider the following graphs, which have known communities, which you can download from here.

(1)   The Karate Club graph. You can load this dataset using a built-in NetworkX command. You can download the communities from here.

(2)   The Books for US politics dataset.

(3)   The Political Blogs dataset.

Use the largest WCC for all these graphs.

 

Using these datasets you will answer the following questions:

 

A.    You will use Personalized Pagerank (PPR) to measure homophily. For a node x, that belongs to community c, we define the homophily score for x as the total PPR mass it allocates to the nodes in the community c (excluding itself). Compute the average homophily score per community and overall and briefly comment on the results.

B.    Apply two of the algorithms we have seen in class for community detection on the graphs (for the same number of communities as the ground-truth communities when possible). Evaluate the algorithms by creating a confusion matrix with the ground-truth communities. Briefly comment on the results.

C.    For the Karate Club dataset, the ground-truth communities correspond to a split of the members of a Karate club after a disagreement between the two teachers of the club. These two teachers are the nodes 1 and 34 in the graph. Use Personalized Pagerank to assign each node to one of the two teachers. Evaluate the assignment using the ground truth communities.

 

The assignments should be done individually. Export your notebook to HTML, and submit both the notebook and the HTML file.

 

Assignment 1

 

Due March 16 in class.

 

For this assignment you will create a Jupyter Notebook where you will experiment with network measurements and network generation models. The notebook will contain the code you have written, the output of your code, and any commentary you have on your results.

 

You can either write your own code or use implementations provided by SNAP, NetworkX, or other sources. It is recommended to use existing libraries. Specify this in your report.


You will consider the following graphs:

(1) The Wiki-Vote and the ego-Facebook graph from the SNAP dataset repository.

(2) An (undirected) Erdos-Renyi random graph.

(3) An (undirected) graph generated using preferential attachment.

(4) A graph generated using the forest fire model.

 

The number of nodes of the generated graphs and (when possible) the (expected) number of edges of each of the synthetically generated graphs should be the same to one of the Wiki-Vote graph.

 

For these graphs:

a. Plot the degree distributions for each graph. Produce 3 plots (simple distribution, cumulative distribution, zipf).  All plots should be in log-log scale. (Use the grid option to put all plots per dataset in the same line)

b. Report the effective diameter for all graphs.

c. Report the clustering co-efficient for all graphs.

 

Briefly comment on the results.

 

The assignments should be done individually. Export your notebook to HTML, and submit both the notebook and the HTML file.