| 
 | Online Social Networks and Media | 
| Home | Project Proposal Guidelines You can find some guidelines for the project report here. Make sure that you start
  the report early! Presentation Schedule The updated schedule for the presentations is as follows: Wednesday 18/1, starting at 13:00: ·        
  Μαρία
  Ζέρβα, Γιάννης
  Κουβάτης:  o   Discovering top-k
  teams of experts with/without a leader in social networks. Mehdi Kargar, Aijun An, CIKM 2011 ·        
  Χρήστος
  Σπαθάρης: o   Antisocial Behavior in
  Online Discussion Communities. Justin Cheng, Cristian Danescu-Niculescu-Mizil,
  Jure Leskovec, ICWSM 2015 ·        
  Μαρία
  Παππά:  o   Tracking the Evolution
  of Communities in Dynamic Social Networks, Derek Greene, Dónal
  Doyle, Padraig Cunningham, ASONAM 2010 ·        
  Αδαμόπουλος
  Γιώργος,
  Βαλεκάρδας
  Δημήτρης: o   A hashtag
  recommendation system for twitter data streams. Eriko Otsuka, Scott A.
  Wallace, David Chiu, Computational Social Networks, 2016 ·        
  Κωνσταντίνος
  Δημολίκας,
  Chaysri Piyabhum: o   Quantifying
  Controversy in Social Media. Kiran Garimella, Gianmarco De Francisci Morales,
  Aristides Gionis, Michael Mathioudakis,
  WSDM 2016 ·        
  Παπαμιχαήλ
  Άγγελος: o   Near linear time
  algorithm to detect community structures in large-scale networks.  Usha Nandini Raghavan, Reka Albert, Soundar Kumara, Physical Review E 76, 2007 ·        
  Βιργινία
  Τσίντζου:  o   Density-based place
  clustering in geo-social networks. Shi, J., Mamoulis,
  N., Wu, D., & Cheung, D. W., 
  SIGMOD 2014 Paper Presentation Guidelines The presentations will be
  evaluated based on the quality of the presentation, and the comprehension of
  the material covered. The following are some guideline, tips and advice for
  preparing your presentation. ·        
  You have 20 minutes for the presentation (1 student group) and 25 minutes
  (2 students group). We will enforce the time limit and cut you off if you
  have not completed on time. 10
  more minutes will be allocated for questions. We may randomly pick someone
  from the audience to ask a question, so everyone should pay attention. ·        
  You should prepare around 20-25 slides, given that a slide
  takes around a minute to talk about on average.  ·        
  Break you presentation into thematic units.
  The following flow is very common:  
 ·        
  The talk should be self-contained. Do not assume that the audience has read the
  paper, or some previous work that you consider known. Define all the concepts
  you need and all the notation that you use. Refer only to related
  work that you know. ·        
  Since the time for the talk is short,
  you will need to focus on the important parts of the paper and avoid going
  through all the details. The goal is to give a summary of the paper and have
  a clear message. Just because you read all the paper it does not mean that
  you should present everything. At the same time, you should not skip
  important information. Focusing on the right part to present is important
  since it shows that you understood the paper well. ·        
  Prepare the slides carefully. Do not add
  too much text, and only the math symbols necessary. Do not use full
  sentences, but rather keywords and short phrases. Make sure the slides are
  readable and not too loaded. Never ever project parts of the paper pdf. ·        
  Practice! Good talks are the result of a
  lot of practice even if they seem spontaneous and fun to the audience.
  Practice the talk several times, and time yourself to make sure you are
  within the time bounds. Some fun advice on how
  to give a bad talk (and more) here. Projects The list of projects is available here. The assignment is
  First-Come-First-Serve. The timeline for the projects is as follows: 
 Assignment 2 Due on Wednesday 7/12 in class.  For this
  assignment, you can either write your own code or use implementations
  provided by SNAP, NetworkX, or other sources.
  Specify this in your report. Question 2 can be done in teams of two, while
  the remaining questions should be done individually.   Question 1  Consider an
  alternative definition of density for a set of nodes S in a graph G to be the
  minimum degree of any node in S in the induced subgraph G[S], that is,  
 where G[S] denotes the subgraph induced
  by the set of nodes S in the graph G, and degree(v,G[S])
  denotes the degree of node  Question 2 For this
  Question you will use again the DBLP10 dataset you used for Assignment 1. Use
  the link analysis ranking techniques we described in class to propose two
  algorithms for ranking the conferences in the data. Your algorithms should
  make use of the network between the authors. You can use existing
  implementations of the link analysis ranking algorithms when available. Question 3 In class,
  we studied the problem of selecting seed nodes to maximize the expected
  spread of influence in a network for the independent cascade model. Consider
  the case that we are given a set of seed nodes S that initiate the spread of
  harmful information. We want to select a set of (non-seed) nodes to
  “immunize” so as to stop the spread of the harmful information. Immunized
  nodes do not propagate the information, thus blocking the spread. Given a
  network G, the set S, and a budget K, we want to find a set of nodes B of
  size K to immunize such that the expected number of “protected nodes” that
  never receive the information, P(B), is maximized.  a.   
  Show
  that the function P(B) is not a submodular function.
  Note that for this proof it suffices to construct a counter-example where the
  submodularity property does not hold. (Hint: Assume
  deterministic propagation of the information). b.   
  Propose
  a heuristic algorithm for selecting the set B. Teams for the Assignment 1 Team 1 (Forest Fire): Μαρία Ζέρβα, Ιωάννης Κουβάτης, Χρήστος Σπαθάρης  Team 2 (Kronecker graph):
  Άγγελος Παπαμιχαήλ, Δημήτρης Βαλεκάρδας, Βιργινία Τσίντζου, Γιώργος Αδαμόπουλος Team 3 (Preferential attachment and
  copying model): Μαρία Παππά, Κωνσταντίνος Δημολίκας, Chaysri Piyabhum  Assignment 1 Due on Wednesday 16/11 in class.  The goal of
  this assignment is for you to familiarize yourself with network measurements
  and network generation models, and experiment with community detection/graph
  clustering algorithms. This is a group assignment. There will be 3 teams, and
  each team will work with a different graph model. For this
  assignment, you can either write your own code or use implementations
  provided by SNAP, NetworkX, or other sources.
  Specify this in your report.   You will
  use the DBLP10 dataset which you can download from here. Dataset description. The DBLP10
  dataset includes publications from computers science conference between 2006
  and 2015. Nodes correspond to authors. There is an edge between two authors
  if they have written an article together.  Co-authorship: Data in the form (id1, id2) meaning
  that author with id1 co-authored an article with author with id2. Authors: Data in the form (id, n) indicating
  that the author (node) with identifier id has name n. Label: Data in the form (id, c) indicating
  that author with identifier id wrote a paper at conference c (a node may have
  multiple labels). Question (1) [models and
  measurements] Consider
  the following graphs:  (1) The
  DBLP10 co-authorship graph,  (2)
  An Erdos-Renyi random graph with the same
  number of nodes as DBLP10 graph, and  (3) A graph
  generated using one of the following
  alternatives: (a) a Kronecker graph, (b)
  preferential attachment and copying models, or (c) forest fire. Again the
  number of nodes of the generated graph should be similar to the number of
  nodes of DBLP. For these
  graphs: (a)  
  Plot
  the degree distribution. Produce 5 plots (simple distribution, bins of equal
  size, bins of exponential size, cumulative, zipf).  All plots should be in log-log scale. (b)  
  Report
  the effective diameter. (c)   
  Report
  the clustering co-efficient. Questions (2) [homophily] Use the
  labels of the authors in the co-authorship to study the homophily
  of the DBLP10 co-authorship graph. 
  Devise your own methodology for measuring homophily
  and describe it in your report. Questions (3) [clustering] (a) Apply clustering
  on the co-authorship graph. Report the number of clusters, information about
  their size and the name of the authors in the largest cluster. Also, evaluate
  the quality of clusters using the metrics described in class. You can use any
  of the clustering algorithms we presented in class. If necessary, experiment
  with different number of clusters. (b) Use the
  labels of the authors to evaluate the homogeneity of the clusters. Devise
  your own metric for this, and describe it in the report. |