Community detection with InfoMap algorithm producing one massive module
Asked Answered
L

2

9

I am using the InfoMap algorithm in the igraph package to perform community detection on a directed and non-weighted graph (34943 vertices, 206366 edges). In the graph, vertices represent websites and edges represent the existence of a hyperlink between websites.

A problem I have encountered after running the algorithm is that the majority of vertices have a membership in a single massive community (32920 or 94%). The rest of the vertices are dispersed into hundreds of other tiny communities.

I have tried different settings with the nb.trials parameter (i.e. 50, 100, and now running 500). However, this doesn't seem to change the result much.

I am feeling rather exasperated because the run-time on the algorithm is quite high, so I have to wait each time for the results (with no luck yet!!).

Many thanks.

Lefton answered 4/12, 2013 at 1:9 Comment(12)
Have you tried visualising the graph to see what sort of community structures you are expecting? Maybe try a heatmap of the whole network, or circle plots of the resulting community structures -- this will help you determine if the community structure being found is correct, or whether the detection algorithm is just doing a bad job (and giving you an idea of where to go from there).Kohl
Hi @Manetheran, thanks for the suggestion. I have not used heatmaps or circle plots before. Could you please point me to the correct package or functions? Thank you.Lefton
I would say, try other methods, and see whether the results of those make sense. Just to be sure that it is not a bug in the implementation.Indihar
@GaborCsardi Thanks, I'll try other methods. I'm running edge.betweenness.community now but taking a long time. However, all other community detection approaches seem to only support undirected graphs. I am not sure what it means analytically if I perform community detection on the underlying graph.Lefton
@timothyjgraham: edge.betweenness.community is hopeless for your graph. Try algorithms that scale better.Indihar
@Lefton One of the reasons why most community detection methods support undirected graph only is because it is unclear what communities mean in a directed graph. The InfoMap algorithm says that a community is something that is hard to escape from when you perform a random walk on the graph, so it makes sense for directed graphs. The modularity-based algorithms say that a community is a subgraph within which there are more edges than what you would expect from a random graph with the same degree distribution - but note that this ignores edge directions entirely.Archaeological
Another thing that you may try is to take the single massive community subgraph, extract it from your network using induced.subgraph and then run InfoMap on the subgraph again.Archaeological
@GaborCsardi Hi Gabor, you wrote above that edge.betweenness.community does not scale well. I am writing a paper and need to reference this fact. Is there any literature I can cite? Thank you in advance.Lefton
@timothyjgraham: maybe the original publication of the algorithm, but this is just a guess.Indihar
@GaborCsardi Great guess! It is correct. The relevant citation is Page 7826 in Girvan, M., & Newman, M. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12), 7821-7826.Lefton
I'm a little late to the game here, but also try multilevel.community. It scales to pretty large graphs (I'm currently using it on one with 2.5 million edges). It also gives you a hierarchy of communities, so you aren't limited to the single result it chooses with the highest modularity.Mejias
Thanks, Zach. This sounds like a very good solution to the problems posed in community detection on large, directed, weighted graphs.Lefton
L
8

Thanks for all the excellent comments. In the end, I got it working by downloading and running the source code for Infomap, which is available at: http://www.mapequation.org/code.html.

Due to licence issues, the latest code has not been integrated with igraph.

This solved the problem of too many nodes being 'lumped' into a single massive community.

Specifically, I used the following options from the command line: -N 10 --directed --two-level --map

Kudos to Martin Rosvall from the Infomap project for providing me with detailed help to resolve this problem.

For the interested reader, here is more information about this issue:

When a network collapses into one major cluster, it is most often because of a very dense and random link structure ... In the code for directed networks implemented in iGraph, teleportation is encoded. If many nodes have no outlinks, the effect of teleportation can be significant because it randomly connect nodes. We have made new code available here: http://www.mapequation.org/code.html that can cluster network without encoding the random teleportation necessary to make the dynamics ergodic. For details, see this paper: http://pre.aps.org/abstract/PRE/v85/i5/e056107

Lefton answered 5/12, 2013 at 3:21 Comment(4)
could you help me with infomap code , I get error 1 ?Monometallic
Hi @academic.user, could you please provide more information about the error code. Can you please also provide information about what kind of graph you are analysing (e.g. number of vertices/edges? directed? weighted? density? etc). Also, what were you doing or attempting to do when the error occurred.Lefton
there was some problem with the file itself , and I downlaod it from Bitbucket. and it work, thanks again for your replyMonometallic
sorry @Lefton again , I was Wondering if you create a Multiplex input data to work with infomap, please see this Question : #28819195Monometallic
K
5

I was going to put this in a comment, but it ended up being too long and hard to read in that format, so this is a tangentially related answer.

One thing you should do is assess whether the algorithm is doing a good job at finding community structure. You can try to visualise your network to establish:

  1. Is the algorithm returning community structures that make sense? Maybe there is one massive community?
  2. If not does the visualisation provide insight as to why?

This will help inform your next steps. Maybe the structure of the network requires a different algorithm?

One thing I find useful for large networks is plotting your edges as a heatmap. This is simple to do if you have your edges stored in an adjacency matrix.

For this, you can use the image function, passing in your matrix of edges as the argument z. Hopefully this will allow you to see by eye the community structure.

However you also want to assess the correctness of your algorithm, so you want to sort the nodes (rows and columns of your adjacency matrix) by the community they've been assigned to.

Another thing to note is that if your edges are directed it may be more difficult to assess by eye as edges can appear on either side of the diagonal of the heatmap. One thing you can do is instead plot the underlying graph -- that is the adjacency matrix assuming your edges are undirected.

If your algorithm is doing a good job, you would expect to see square blocks along the diagonal, one for each detected community.

Kohl answered 4/12, 2013 at 4:34 Comment(2)
Hi there, thanks for your suggestions. I tried the heatmap approach by plotting the edge adjacency matrix - the 'heat' is all concentrated in a rectangular shape in the top 10% or so of the plot (i.e. a rectangular block running along the y-axis, extending down about 10% from the top of the y-axis). The rest of the plot is largely white space (with a few tiny dots of heat, no bigger than a few pixels.Lefton
The algorithm does not seem to return community structures that make sense. Around 95% of the nodes are in a single community, and the other 5% mostly occupy a community each (i.e. one node per community). So each node is either in the massive community, or in a community by itself! I am using the InfoMap algorithm because it supports directed graphs, where other community detection algorithms do not. I tried Spinglass and it returned communities that make sense (7 communities in total) - however it ignores directedness of the graph.Lefton

© 2022 - 2024 — McMap. All rights reserved.