Textrank: complementing pagerank for sentence extraction using networkx
Asked Answered
G

2

5

I am trying to implement textrank algorithm for sentence extraction as described here. For that in need to complement pagerank algorithm with weighted edges and get it to run on undirected graphs. Networkx pagerank algorithm implementation allows me to easely integrate weighted edges and is said to convert directed graphs to undirected: see here. However, when I tested it still seems to use directed graph. What am I missing here? Help greatly appriciated.

Example:

import networkx as nx
D=nx.DiGraph()
D.add_weighted_edges_from([('A','B',0.5),('A','C',1)])
print nx.pagerank(D)

Outpunt: {'A': 0.25974025929223499, 'C': 0.40692640737443164, 'B': 0.33333333333333331}

Grady answered 12/2, 2012 at 8:48 Comment(0)
S
9

I think you misinterpreted the note on the networkx documentation. Though, I must admit it might be worded better.

The PageRank algorithm was designed for directed graphs but this algorithm does not check if the input graph is directed and will execute on undirected graphs by converting each oriented edge in the directed graph to two edges.

What this tells is that, PageRank algorithm is designed for directed graphs, but it can be used for undirected graphs. To do so, it converts the undirected network to a directed network by replacing each edge with two directed edges (in and out).

Therefore, if you give it a directed network, it will calculate the PageRank according to the directed structure. So either start with an undirected network:

import networkx as nx

# Undirected Network
D = nx.Graph()
D.add_weighted_edges_from([('A', 'B', 0.5),('A', 'C', 1)])

# Default max number of iterations failed to converge for me
print nx.pagerank(D, max_iter=200)

# Outputs:
{'A': 0.48648648872844047, 'C': 0.32567567418103965, 'B': 0.18783783709051982}

or if you already have a directed network, convert it to an undirected one:

import networkx as nx

# Directed Network
D = nx.DiGraph()
D.add_weighted_edges_from([('A', 'B', 0.5), ('A', 'C', 1)])

# Convert to undirected
G = D.to_undirected()

# Default max number of iterations failed to converge for me
print nx.pagerank(G, max_iter=200)

# Outputs:
{'A': 0.48648648872844047, 'C': 0.32567567418103965, 'B': 0.18783783709051982}
Shawntashawwal answered 12/2, 2012 at 9:35 Comment(0)
C
0

A nice implementation of the TextRank algorithm in python can be found here. If you want to use this script you have to run nltk.download() beforehand, to install the necessary data files as described here.

Clementclementas answered 2/10, 2012 at 10:18 Comment(1)
That implementation is not for the sentence extraction, but rather the keyword extraction. You can see that from the comments below the code.Talich

© 2022 - 2024 — McMap. All rights reserved.