Handling Graphs with NetworkX¶
Python offers the library NetworkX for manipulating graphs. You can learn more here:
https://networkx.github.io/documentation/stable/tutorial.html
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
Creating a graph¶
G = nx.Graph()
Add nodes to the graph
G.add_node(1)
G.add_nodes_from([2,3])
G.add_node('Alice')
G.add_node('Bob')
print(G.nodes())
[1, 2, 3, 'Alice', 'Bob']
Draw the graph
nx.draw_networkx(G)
Add edges to the graph
G.add_edge(1,2)
G.add_edges_from([(1,3),('Alice','Bob')])
e = (1,'Alice')
G.add_edge(*e)
nx.draw_networkx(G)
Adding an edge with a new node will create the node Charlie in the graph
G.add_edge('Alice','Charlie')
print(G.edges())
print(G.nodes())
nx.draw_networkx(G)
[(1, 2), (1, 3), (1, 'Alice'), ('Alice', 'Bob'), ('Alice', 'Charlie')] [1, 2, 3, 'Alice', 'Bob', 'Charlie']
Creating a graph from edges
This is the most common way to create a graph
G2 = nx.Graph()
G2.add_edges_from([(1,2),(1,3),('Alice','Bob'),(1,'Alice')])
print(G2.nodes())
print(G2.edges())
nx.draw_networkx(G2)
[1, 2, 3, 'Alice', 'Bob'] [(1, 2), (1, 3), (1, 'Alice'), ('Alice', 'Bob')]
Reading a graph from a file with list of edges
https://networkx.org/documentation/stable/reference/readwrite/index.html
#Read a graph from a list of edges
G3 = nx.read_edgelist('graph_edges.txt')
print(G3.nodes())
print(G3.edges())
nx.draw_networkx(G3)
['1', '2', '3', 'Alice', 'Bob', 'Charlie'] [('1', '2'), ('1', '3'), ('1', 'Alice'), ('2', '3'), ('Alice', 'Bob'), ('Bob', 'Charlie')]
Removing edges and nodes
G2.remove_edge(1,3)
G2.remove_node(3)
G2.remove_node(1)
print(G2.nodes())
print(G2.edges())
nx.draw_networkx(G2)
[2, 'Alice', 'Bob'] [('Alice', 'Bob')]
The graph object, nodes and edges¶
A graph can be thought of as a dictionary with nodes as the keys
Each node is a dictionary with the neighbors as the keys, and the edge properties as values
type(G)
networkx.classes.graph.Graph
def __init__(incoming_graph_data=None, **attr)
Base class for undirected graphs. A Graph stores nodes and edges with optional data, or attributes. Graphs hold undirected edges. Self loops are allowed but multiple (parallel) edges are not. Nodes can be arbitrary (hashable) Python objects with optional key/value attributes, except that `None` is not allowed as a node. Edges are represented as links between nodes with optional key/value attributes. Parameters ---------- incoming_graph_data : input graph (optional, default: None) Data to initialize graph. If None (default) an empty graph is created. The data can be any format that is supported by the to_networkx_graph() function, currently including edge list, dict of dicts, dict of lists, NetworkX graph, 2D NumPy array, SciPy sparse matrix, or PyGraphviz graph. attr : keyword arguments, optional (default= no attributes) Attributes to add to graph as key=value pairs. See Also -------- DiGraph MultiGraph MultiDiGraph Examples -------- Create an empty graph structure (a "null graph") with no nodes and no edges. >>> G = nx.Graph() G can be grown in several ways. **Nodes:** Add one node at a time: >>> G.add_node(1) Add the nodes from any container (a list, dict, set or even the lines from a file or the nodes from another graph). >>> G.add_nodes_from([2, 3]) >>> G.add_nodes_from(range(100, 110)) >>> H = nx.path_graph(10) >>> G.add_nodes_from(H) In addition to strings and integers any hashable Python object (except None) can represent a node, e.g. a customized node object, or even another Graph. >>> G.add_node(H) **Edges:** G can also be grown by adding edges. Add one edge, >>> G.add_edge(1, 2) a list of edges, >>> G.add_edges_from([(1, 2), (1, 3)]) or a collection of edges, >>> G.add_edges_from(H.edges) If some edges connect nodes not yet in the graph, the nodes are added automatically. There are no errors when adding nodes or edges that already exist. **Attributes:** Each graph, node, and edge can hold key/value attribute pairs in an associated attribute dictionary (the keys must be hashable). By default these are empty, but can be added or changed using add_edge, add_node or direct manipulation of the attribute dictionaries named graph, node and edge respectively. >>> G = nx.Graph(day="Friday") >>> G.graph {'day': 'Friday'} Add node attributes using add_node(), add_nodes_from() or G.nodes >>> G.add_node(1, time="5pm") >>> G.add_nodes_from([3], time="2pm") >>> G.nodes[1] {'time': '5pm'} >>> G.nodes[1]["room"] = 714 # node must exist already to use G.nodes >>> del G.nodes[1]["room"] # remove attribute >>> list(G.nodes(data=True)) [(1, {'time': '5pm'}), (3, {'time': '2pm'})] Add edge attributes using add_edge(), add_edges_from(), subscript notation, or G.edges. >>> G.add_edge(1, 2, weight=4.7) >>> G.add_edges_from([(3, 4), (4, 5)], color="red") >>> G.add_edges_from([(1, 2, {"color": "blue"}), (2, 3, {"weight": 8})]) >>> G[1][2]["weight"] = 4.7 >>> G.edges[1, 2]["weight"] = 4 Warning: we protect the graph data structure by making `G.edges` a read-only dict-like structure. However, you can assign to attributes in e.g. `G.edges[1, 2]`. Thus, use 2 sets of brackets to add/change data attributes: `G.edges[1, 2]['weight'] = 4` (For multigraphs: `MG.edges[u, v, key][name] = value`). **Shortcuts:** Many common graph features allow python syntax to speed reporting. >>> 1 in G # check if node in graph True >>> [n for n in G if n < 3] # iterate through nodes [1, 2] >>> len(G) # number of nodes in graph 5 Often the best way to traverse all edges of a graph is via the neighbors. The neighbors are reported as an adjacency-dict `G.adj` or `G.adjacency()` >>> for n, nbrsdict in G.adjacency(): ... for nbr, eattr in nbrsdict.items(): ... if "weight" in eattr: ... # Do something useful with the edges ... pass But the edges() method is often more convenient: >>> for u, v, weight in G.edges.data("weight"): ... if weight is not None: ... # Do something useful with the edges ... pass **Reporting:** Simple graph information is obtained using object-attributes and methods. Reporting typically provides views instead of containers to reduce memory usage. The views update as the graph is updated similarly to dict-views. The objects `nodes`, `edges` and `adj` provide access to data attributes via lookup (e.g. `nodes[n]`, `edges[u, v]`, `adj[u][v]`) and iteration (e.g. `nodes.items()`, `nodes.data('color')`, `nodes.data('color', default='blue')` and similarly for `edges`) Views exist for `nodes`, `edges`, `neighbors()`/`adj` and `degree`. For details on these and other miscellaneous methods, see below. **Subclasses (Advanced):** The Graph class uses a dict-of-dict-of-dict data structure. The outer dict (node_dict) holds adjacency information keyed by node. The next dict (adjlist_dict) represents the adjacency information and holds edge data keyed by neighbor. The inner dict (edge_attr_dict) represents the edge data and holds edge attribute values keyed by attribute names. Each of these three dicts can be replaced in a subclass by a user defined dict-like object. In general, the dict-like features should be maintained but extra features can be added. To replace one of the dicts create a new graph class by changing the class(!) variable holding the factory for that dict-like structure. node_dict_factory : function, (default: dict) Factory function to be used to create the dict containing node attributes, keyed by node id. It should require no arguments and return a dict-like object node_attr_dict_factory: function, (default: dict) Factory function to be used to create the node attribute dict which holds attribute values keyed by attribute name. It should require no arguments and return a dict-like object adjlist_outer_dict_factory : function, (default: dict) Factory function to be used to create the outer-most dict in the data structure that holds adjacency info keyed by node. It should require no arguments and return a dict-like object. adjlist_inner_dict_factory : function, (default: dict) Factory function to be used to create the adjacency list dict which holds edge data keyed by neighbor. It should require no arguments and return a dict-like object edge_attr_dict_factory : function, (default: dict) Factory function to be used to create the edge attribute dict which holds attribute values keyed by attribute name. It should require no arguments and return a dict-like object. graph_attr_dict_factory : function, (default: dict) Factory function to be used to create the graph attribute dict which holds attribute values keyed by attribute name. It should require no arguments and return a dict-like object. Typically, if your extension doesn't impact the data structure all methods will inherit without issue except: `to_directed/to_undirected`. By default these methods create a DiGraph/Graph class and you probably want them to create your extension of a DiGraph/Graph. To facilitate this we define two class variables that you can set in your subclass. to_directed_class : callable, (default: DiGraph or MultiDiGraph) Class to create a new graph structure in the `to_directed` method. If `None`, a NetworkX class (DiGraph or MultiDiGraph) is used. to_undirected_class : callable, (default: Graph or MultiGraph) Class to create a new graph structure in the `to_undirected` method. If `None`, a NetworkX class (Graph or MultiGraph) is used. **Subclassing Example** Create a low memory graph class that effectively disallows edge attributes by using a single attribute dict for all edges. This reduces the memory used, but you lose edge attributes. >>> class ThinGraph(nx.Graph): ... all_edge_dict = {"weight": 1} ... ... def single_edge_dict(self): ... return self.all_edge_dict ... ... edge_attr_dict_factory = single_edge_dict >>> G = ThinGraph() >>> G.add_edge(2, 1) >>> G[2][1] {'weight': 1} >>> G.add_edge(2, 2) >>> G[2][1] is G[2][2] True
nx.draw_networkx(G)
for n in G: print(n,':',G[n])
1 : {2: {}, 3: {}, 'Alice': {}} 2 : {1: {}} 3 : {1: {}} Alice : {'Bob': {}, 1: {}, 'Charlie': {}} Bob : {'Alice': {}} Charlie : {'Alice': {}}
G[1]
AtlasView({2: {}, 3: {}, 'Alice': {}})
print(G.nodes)
print(G.edges)
[1, 2, 3, 'Alice', 'Bob', 'Charlie'] [(1, 2), (1, 3), (1, 'Alice'), ('Alice', 'Bob'), ('Alice', 'Charlie')]
#For every node we can obtain the properties of the node (in this example this is empty)
print(G.nodes['Alice'])
#For every edge we can obtain the properties of the edge (in this example this is empty)
print(G.edges[(1,2)])
{} {}
Graph attributes¶
You can assign attributes and values to the nodes of the graph
G3.nodes['Alice']['gender'] = 'female'
G3.nodes['Bob']['gender'] = 'male'
G3.nodes['Charlie']['gender'] = 'male'
G3.nodes['1']['value'] = 1
G3.nodes['2']['value'] = -1
G3.nodes['3']['value'] = 0
for n in G3.nodes():
print(n,':',G3.nodes[n])
1 : {'value': 1} 2 : {'value': -1} 3 : {'value': 0} Alice : {'gender': 'female'} Bob : {'gender': 'male'} Charlie : {'gender': 'male'}
for n in G3.nodes():
print(n,':',G3[n])
1 : {'2': {}, '3': {}, 'Alice': {}} 2 : {'1': {}, '3': {}} 3 : {'1': {}, '2': {}} Alice : {'Bob': {}, '1': {}} Bob : {'Alice': {}, 'Charlie': {}} Charlie : {'Bob': {}}
G3.nodes['Alice']['value'] = 1
G3.nodes['Bob']['value'] = -1
G3.nodes['Charlie']['value'] = 1
for n in G3.nodes():
print(n+ ":" + str(G3.nodes[n]['value']))
for n in G3.nodes():
print(n,':',G3.nodes[n])
1:1 2:-1 3:0 Alice:1 Bob:-1 Charlie:1 1 : {'value': 1} 2 : {'value': -1} 3 : {'value': 0} Alice : {'gender': 'female', 'value': 1} Bob : {'gender': 'male', 'value': -1} Charlie : {'gender': 'male', 'value': 1}
You can also add attributes and values to the edges in the graph
G3['Alice']['Bob']['label'] = 'strong'
print(G3['Bob']['Alice'])
print(G3['Alice'])
print(G3['Bob'])
{'label': 'strong'} {'Bob': {'label': 'strong'}, '1': {}} {'Alice': {'label': 'strong'}, 'Charlie': {}}
for e in G3.edges:
print(e,':',G3.edges[e])
('1', '2') : {} ('1', '3') : {} ('1', 'Alice') : {} ('2', '3') : {} ('Alice', 'Bob') : {'label': 'strong'} ('Bob', 'Charlie') : {}
print(G3.edges[('Alice','Bob')])
print(G3.edges[('Alice','Bob')]['label'])
{'label': 'strong'} strong
# The method get_edge_attributes returns a dictionary with keys the edges
#and values the values of the attribute we requested (for the edges that have it)
nx.get_edge_attributes(G3,'label')
{('Alice', 'Bob'): 'strong'}
pos=nx.spring_layout(G3)
nx.draw_networkx(G3, pos)
my_labels = nx.get_edge_attributes(G3,'label')
print(my_labels)
nx.draw_networkx_edge_labels(G3, pos, edge_labels=my_labels)
#plt.show()
{('Alice', 'Bob'): 'strong'}
{('Alice', 'Bob'): Text(-0.37570925580541625, -0.37061980394875893, 'strong')}
Weighted graphs¶
A special attribute of a an edge is the "weight". When adding weighted edges, you enter triples consisting of the two edge endpoints and the weight of the edge. This weight is stored in an attribute "weight" by default.
G4 = nx.Graph()
G4.add_weighted_edges_from([(1,2,0.5),(2,3,0.1),(3,4,0.7)])
for (a,b) in G4.edges():
print (a,b,':',G4[a][b])
for (a,b,w) in G4.edges(data =True): #data=True returns weight as well
print (str(a)+" "+ str(b) + " " + str(w['weight']))
for n in G4:
print(n,':',G4[n])
for e in G4.edges:
print(e,':',G4.edges[e])
1 2 : {'weight': 0.5} 2 3 : {'weight': 0.1} 3 4 : {'weight': 0.7} 1 2 0.5 2 3 0.1 3 4 0.7 1 : {2: {'weight': 0.5}} 2 : {1: {'weight': 0.5}, 3: {'weight': 0.1}} 3 : {2: {'weight': 0.1}, 4: {'weight': 0.7}} 4 : {3: {'weight': 0.7}} (1, 2) : {'weight': 0.5} (2, 3) : {'weight': 0.1} (3, 4) : {'weight': 0.7}
weights = nx.get_edge_attributes(G4,'weight')
print(weights)
{(1, 2): 0.5, (2, 3): 0.1, (3, 4): 0.7}
pos=nx.spring_layout(G4)
nx.draw_networkx(G4, pos)
nx.draw_networkx_edge_labels(G4, pos, edge_labels=weights)
{(1, 2): Text(0.7524855355227522, -0.022460304876532503, '0.5'), (2, 3): Text(-0.014391343375042887, 0.00039467951111531174, '0.1'), (3, 4): Text(-0.7524894595304484, 0.022460421295667984, '0.7')}
print(weights)
{(1, 2): 0.5, (2, 3): 0.1, (3, 4): 0.7}
Directed Graphs¶
DG1=nx.DiGraph()
DG1.add_edges_from([(1,2), (2,1), (3,1), (1,4), (2,4), (4,3)])
pos=nx.spring_layout(DG1)
nx.draw_networkx(DG1,pos)
DG2=nx.DiGraph()
DG2.add_weighted_edges_from([(1,2,0.5), (3,1,0.75), (1,4,0.1)])
print(DG2.edges())
for n in DG2:
print(n, DG2[n])
pos=nx.spring_layout(DG2)
nx.draw_networkx(DG2,pos)
weights = nx.get_edge_attributes(DG2,'weight')
nx.draw_networkx_edge_labels(DG2, pos, edge_labels=weights)
[(1, 2), (1, 4), (3, 1)] 1 {2: {'weight': 0.5}, 4: {'weight': 0.1}} 2 {} 3 {1: {'weight': 0.75}} 4 {}
{(1, 2): Text(0.5344112176558196, 0.07453179365090234, '0.5'), (1, 4): Text(-0.40025893727504, -0.2823709501419644, '0.1'), (3, 1): Text(0.06531809932901345, 0.15992679438194546, '0.75')}
You can transform a directed graph into an undirected on using to_undirected()
UG1 = DG1.to_undirected()
print(UG1.edges())
for n in UG1:
print(n, UG1[n])
nx.draw_networkx(UG1)
[(1, 2), (1, 4), (1, 3), (2, 4), (3, 4)] 1 {2: {}, 4: {}, 3: {}} 2 {1: {}, 4: {}} 3 {1: {}, 4: {}} 4 {1: {}, 2: {}, 3: {}}
UG2 = DG2.to_undirected()
print(UG2.edges())
for n in UG2:
print(n, UG2[n])
nx.draw_networkx(UG2)
[(1, 2), (1, 4), (1, 3)] 1 {2: {'weight': 0.5}, 4: {'weight': 0.1}, 3: {'weight': 0.75}} 2 {1: {'weight': 0.5}} 3 {1: {'weight': 0.75}} 4 {1: {'weight': 0.1}}
Graph Operations¶
Some common graph functions and algorithms are already implemented in networkx library.
Neighbors, degrees and adjancency matrix
nx.draw_networkx(G)
A = nx.adjacency_matrix(G)
print(A)
#the adjacency matrix is stored as a sparse matrix
print(type(A))
(0, 1) 1 (0, 2) 1 (0, 3) 1 (1, 0) 1 (2, 0) 1 (3, 0) 1 (3, 4) 1 (3, 5) 1 (4, 3) 1 (5, 3) 1 <class 'scipy.sparse._csr.csr_array'>
print(G.neighbors(1)) # returns the neighbors of a node
print(list(G.neighbors(1)))
for x in G.neighbors(1):
print(x)
print('degree of 1:',G.degree(1)) # returns the degree of a node
<dict_keyiterator object at 0x78bddc3e2610> [2, 3, 'Alice'] 2 3 Alice degree of 1: 3
print(G4[3])
print(G4.degree(3, weight='weight'))
print(G4.degree(3))
{2: {'weight': 0.1}, 4: {'weight': 0.7}} 0.7999999999999999 2
Neighbors and degrees for directed or weighted graphs
pos=nx.spring_layout(DG2)
nx.draw_networkx(DG2,pos)
weights = nx.get_edge_attributes(DG2,'weight')
nx.draw_networkx_edge_labels(DG2, pos, edge_labels=weights)
{(1, 2): Text(0.5272487293673744, -0.03409400608090196, '0.5'), (1, 4): Text(-0.3972168205417217, -0.12617068508745777, '0.1'), (3, 1): Text(0.0755496192025753, 0.2051282745726466, '0.75')}
print(list(DG2.successors(1)))
print(list(DG2.neighbors(1)))
print(list(DG2.predecessors(1)))
print(DG2.out_degree(1))
print(DG2.in_degree(1))
print(DG2.out_degree(1,weight='weight'))
[2, 4] [2, 4] [3] 2 1 0.6
Connected components¶
G5 = nx.Graph()
G5.add_edges_from([(1,2),(1,3),(1,4),(2,3),('Alice','Bob'),('Bob','Charlie'),('Alice','Charlie')])
nx.draw_networkx(G5)
C = nx.connected_components(G5)
print(type(C))
for c in C:
print(c)
<class 'generator'> {1, 2, 3, 4} {'Alice', 'Charlie', 'Bob'}
Get the connected component subgraphs
connected_subgraphs = [nx.subgraph(G5,c) for c in nx.connected_components(G5)]
for GC in connected_subgraphs:
print('Connected Component')
print(GC.nodes())
print(GC.edges())
print(len(GC))
Connected Component [1, 2, 3, 4] [(1, 2), (1, 3), (1, 4), (2, 3)] 4 Connected Component ['Alice', 'Charlie', 'Bob'] [('Alice', 'Bob'), ('Alice', 'Charlie'), ('Charlie', 'Bob')] 3
Get the largest connected component
# Get the nodes
largest_cc = max(nx.connected_components(G5), key=len)
print(largest_cc)
{1, 2, 3, 4}
#Get the subgraph
largest_cc = max(nx.connected_components(G5), key=len)
print(largest_cc)
CC_max = nx.subgraph(G5,largest_cc)
nx.draw_networkx(CC_max)
{1, 2, 3, 4}
Strongly Connected Components¶
DG = nx.DiGraph()
DG.add_edges_from([(1,2),(2,3),(3,4),(4,2),(4,3),(4,5)])
nx.draw_networkx(DG)
largest_SCC = max(nx.strongly_connected_components(DG), key=len)
#print(largest_SCC)
SCC_max = nx.subgraph(DG,largest_SCC)
nx.draw_networkx(SCC_max)
Shortest paths¶
G6 = nx.Graph()
G6.add_edges_from([(1,2),(1,3),(1,4),(2,3),(1,'Alice'),('Alice','Bob'),('Alice','Charlie'),('Bob','Charlie')])
nx.draw_networkx(G6)
sp = nx.shortest_path(G6,3,'Bob')
print(sp)
print(len(sp)-1)
print(nx.shortest_path_length(G6,3,'Bob'))
[3, 1, 'Alice', 'Bob'] 3 3
SP1 = nx.single_source_shortest_path(G6,1)
print(SP1)
#print(nx.single_source_shortest_path_length(G3,'1'))
{1: [1], 2: [1, 2], 3: [1, 3], 4: [1, 4], 'Alice': [1, 'Alice'], 'Bob': [1, 'Alice', 'Bob'], 'Charlie': [1, 'Alice', 'Charlie']}
SP = nx.all_pairs_shortest_path(G6)
print(type(SP))
SP = dict(SP)
print(dict(SP))
<class 'generator'> {1: {1: [1], 2: [1, 2], 3: [1, 3], 4: [1, 4], 'Alice': [1, 'Alice'], 'Bob': [1, 'Alice', 'Bob'], 'Charlie': [1, 'Alice', 'Charlie']}, 2: {2: [2], 1: [2, 1], 3: [2, 3], 4: [2, 1, 4], 'Alice': [2, 1, 'Alice'], 'Bob': [2, 1, 'Alice', 'Bob'], 'Charlie': [2, 1, 'Alice', 'Charlie']}, 3: {3: [3], 1: [3, 1], 2: [3, 2], 4: [3, 1, 4], 'Alice': [3, 1, 'Alice'], 'Bob': [3, 1, 'Alice', 'Bob'], 'Charlie': [3, 1, 'Alice', 'Charlie']}, 4: {4: [4], 1: [4, 1], 2: [4, 1, 2], 3: [4, 1, 3], 'Alice': [4, 1, 'Alice'], 'Bob': [4, 1, 'Alice', 'Bob'], 'Charlie': [4, 1, 'Alice', 'Charlie']}, 'Alice': {'Alice': ['Alice'], 1: ['Alice', 1], 'Bob': ['Alice', 'Bob'], 'Charlie': ['Alice', 'Charlie'], 2: ['Alice', 1, 2], 3: ['Alice', 1, 3], 4: ['Alice', 1, 4]}, 'Bob': {'Bob': ['Bob'], 'Alice': ['Bob', 'Alice'], 'Charlie': ['Bob', 'Charlie'], 1: ['Bob', 'Alice', 1], 2: ['Bob', 'Alice', 1, 2], 3: ['Bob', 'Alice', 1, 3], 4: ['Bob', 'Alice', 1, 4]}, 'Charlie': {'Charlie': ['Charlie'], 'Alice': ['Charlie', 'Alice'], 'Bob': ['Charlie', 'Bob'], 1: ['Charlie', 'Alice', 1], 2: ['Charlie', 'Alice', 1, 2], 3: ['Charlie', 'Alice', 1, 3], 4: ['Charlie', 'Alice', 1, 4]}}
print(SP1['Bob'])
print(SP[1]['Bob'])
[1, 'Alice', 'Bob'] [1, 'Alice', 'Bob']
DG2 = nx.DiGraph()
DG2.add_edges_from([(1,2),(1,3),(3,2),(2,5),(4,1),(4,2),(4,3),(5,1),(5,4)])
nx.draw_networkx(DG2)
Pagerank¶
The result of the Pagerank algorithm is a dictionary with the node ids as keys, and the pagerank values as values.
pr = nx.pagerank(DG2)
print(pr)
print('\n')
pr = nx.pagerank(G6)
print(pr)
{1: 0.18064505060873784, 2: 0.27131643087724033, 3: 0.14665711544131713, 5: 0.26061906832422166, 4: 0.140762334748483} {1: 0.2420308196543696, 2: 0.12671382905463274, 3: 0.12671382905463274, 4: 0.07286059991932929, 'Alice': 0.17987731223897474, 'Bob': 0.12590180503903042, 'Charlie': 0.12590180503903042}
HITS¶
[h,a] = nx.hits(DG2)
print(h)
print(a)
print(a[2])
{1: 0.3028419093958839, 2: 2.6560584919688065e-17, 3: 0.16745199268671332, 5: 0.1254412261267391, 4: 0.40426487179066356} {1: 0.23681287910395027, 2: 0.3909843250829289, 3: 0.3161224561036186, 5: 6.201641558282901e-17, 4: 0.05608033970950216} 0.3909843250829289
Pesronalized Pagerank¶
We will see the example we presented in class
G7 = nx.read_edgelist('graph-example.txt')
nx.draw_networkx(G7)
To define the personalized Pagerank, we will use the "personalization" parameter, were we define the jump vector as a dictionary, with node ids as the keys, and the jump probability to each node as the values
print(nx.pagerank(G7))
print(nx.pagerank(G7, personalization = {'1':1}))
print(nx.pagerank(G7, personalization = {'6':1}))
{'1': 0.12755099866537586, '2': 0.18232887836414127, '3': 0.2394858218322733, '4': 0.18433868256410274, '5': 0.1326787034099837, '6': 0.13361691516412283} {'1': 0.2583391334031776, '2': 0.20094608623757965, '3': 0.24190126489812175, '4': 0.14028792201015408, '5': 0.0833530743914921, '6': 0.07517251905947459} {'1': 0.07517273333798744, '2': 0.12532699696556127, '3': 0.1866533325726192, '4': 0.18958031704255388, '5': 0.15407113246217508, '6': 0.26919548761910295}
We will now change the jump probability. To change the jump probability we will set the parameter alpha.
Attention: The parameter alpha is not the jump probability as we have described in class, but the probability of following an out-link. The jump probability is 1-alpha.
print(nx.pagerank(G7, alpha = 0.5))
print(nx.pagerank(G7, alpha = 0.5, personalization = {'1':1}))
print(nx.pagerank(G7, alpha = 0.5, personalization = {'6':1}))
{'1': 0.1390335398088414, '2': 0.1741686298166486, '3': 0.2133752383994728, '4': 0.17643097727782997, '5': 0.14740281022602297, '6': 0.14958880447118414} {'1': 0.5510064790190851, '2': 0.16964493982349704, '3': 0.18185949066208873, '4': 0.054964764048023335, '5': 0.026690643658381363, '6': 0.015833682788924434} {'1': 0.015833407896886878, '2': 0.039357701756408944, '3': 0.07419162031220411, '4': 0.15675161523961573, '5': 0.150192049257395, '6': 0.5636736055374894}
Betweeness¶
BC = nx.edge_betweenness_centrality(G6)
print(BC)
{(1, 2): 0.23809523809523808, (1, 3): 0.23809523809523808, (1, 4): 0.2857142857142857, (1, 'Alice'): 0.5714285714285714, (2, 3): 0.047619047619047616, ('Alice', 'Bob'): 0.23809523809523808, ('Alice', 'Charlie'): 0.23809523809523808, ('Bob', 'Charlie'): 0.047619047619047616}
nx.draw_networkx(G6)
nx.draw(G6, with_labels = True)
nx.draw_circular(G6)
nx.draw_spectral(G6, with_labels=True)
nx.draw_spring(G6)
nx.draw_spring(DG2)
An example¶
karate=nx.read_gml("karate.gml",label='id')
nx.draw_networkx(karate)
nx.draw_spring(karate,with_labels=True)
Change the size of the nodes depending on their pagerank value
pr = nx.pagerank(karate)
nx.draw_networkx(karate,node_size=[10000*pr[x] for x in pr])
A PageRank implementation¶
We will now do our own implementation of Pagerank. Pagerank values for node $i$ are computed by iterating the following formula:
$$p_i = 0.85\sum_{j\rightarrow i}\frac{p_j}{d_{out}(j)} +0.15\frac 1n$$
We will associate each node with two values: the old pagerank in the previous step and the new one that is currently computed. We initialize the old pagerank to $1/n$
for x in karate.nodes:
karate.nodes[x]['old_pr'] = 1/len(karate.nodes)
karate.nodes[x]['pr'] = 0
The algorithm goes over the edges in the graph, and for each edge (x,y) transfers a fraction of the Pagerank of x to y (and vice versa since the graph is undirected).
For convergece check we want the maximum difference between old and new Pagerank values to be less than eps.
eps = 0.0000001
while (True):
for (x,y) in karate.edges:
karate.nodes[y]['pr'] += karate.nodes[x]['old_pr']/karate.degree(x)
karate.nodes[x]['pr'] += karate.nodes[y]['old_pr']/karate.degree(y)
diff = 0
for x in karate.nodes:
karate.nodes[x]['pr'] = karate.nodes[x]['pr']*0.85 + 0.15/len(karate.nodes)
diff = max(diff, abs(karate.nodes[x]['pr'] - karate.nodes[x]['old_pr']))
if diff < eps: break
for x in karate.nodes:
karate.nodes[x]['old_pr'] = karate.nodes[x]['pr']
karate.nodes[x]['pr'] = 0
print({x:karate.nodes[x]['pr'] for x in karate.nodes})
{1: 0.09699746639416912, 2: 0.05287697716512755, 3: 0.05707850530246944, 4: 0.03585989886187514, 5: 0.02197802085797975, 6: 0.02911125727906845, 7: 0.02911125727906845, 8: 0.024490521601363044, 9: 0.029766036239543145, 10: 0.014309384124315726, 11: 0.02197802085797975, 12: 0.009564757473502347, 13: 0.014644911694305714, 14: 0.029536468425771824, 15: 0.014535968166646367, 16: 0.014535968166646367, 17: 0.016784065326985863, 18: 0.014558694786535084, 19: 0.014535968166646367, 20: 0.019604641610943857, 21: 0.014535968166646367, 22: 0.014558694786535084, 23: 0.014535968166646367, 24: 0.03152244833588808, 25: 0.021075999887935973, 26: 0.02100616170212189, 27: 0.01504401086799329, 28: 0.0256397267700412, 29: 0.019573438474829792, 30: 0.02628848291686834, 31: 0.024590131585182876, 32: 0.037158037426708354, 33: 0.07169310822672904, 34: 0.10091903290492975}
We got essentially the same values as for the Pagerank vector.
pr
{1: 0.09700181758983706, 2: 0.052878391037427, 3: 0.05707842304763673, 4: 0.035860643223064786, 5: 0.021979406974834498, 6: 0.02911334166344221, 7: 0.029113341663442205, 8: 0.02449075803950918, 9: 0.029765339186167028, 10: 0.014308950284462798, 11: 0.021979406974834494, 12: 0.009564916863537146, 13: 0.014645186487916188, 14: 0.02953631497720298, 15: 0.014535161524273824, 16: 0.014535161524273824, 17: 0.016785378110253487, 18: 0.01455885977424349, 19: 0.014535161524273824, 20: 0.01960441671193729, 21: 0.014535161524273824, 22: 0.01455885977424349, 23: 0.014535161524273824, 24: 0.03152091531163227, 25: 0.021075455001162945, 26: 0.02100562817474579, 27: 0.015043395360629753, 28: 0.025638803528350497, 29: 0.019572960509438537, 30: 0.026287262837112076, 31: 0.024589336534292478, 32: 0.037156635922679405, 33: 0.07169213006588289, 34: 0.1009179167487121}
Pagerank implementation using matrix operations
import numpy as np
A = nx.adjacency_matrix(karate)
d = np.array([karate.degree(n) for n in karate.nodes])
D = np.diag(1/d)
P = D@A
N = len(karate.nodes)
u = np.ones(N)/N
p = p_old = u
while(True):
p = (0.85*p_old.T@P+0.15*u.T).T
if np.linalg.norm(p-p_old) < eps: break
p_old = p
p
array([0.09699741, 0.05287696, 0.05707851, 0.03585989, 0.021978 , 0.02911123, 0.02911123, 0.02449052, 0.02976604, 0.01430939, 0.021978 , 0.00956476, 0.01464491, 0.02953647, 0.01453598, 0.01453598, 0.01678405, 0.01455869, 0.01453598, 0.01960464, 0.01453598, 0.01455869, 0.01453598, 0.03152247, 0.02107601, 0.02100617, 0.01504402, 0.02563974, 0.01957344, 0.0262885 , 0.02459014, 0.03715806, 0.07169312, 0.10091905])