How to create a mind map from paragraph of a book

Today I tried to write a program that would take in a paragraph of text and create a graph showing relations between different words. Everything went well, except that I don't know how to find out the connections in a better way. Better way means similar to a mind map.This is a simple input, but I want to create a program that can take a paragraph from wikipedia and give a very nice mind map. The graph that I got out of the dot format output of my program for the following input was

roses are red line_end
sky is blue line_end
life is beautiful line_end
everything is going fine line_end file_end

graphviz output

But for a input like this input, it just create a very big graph which is more obscure than the text itself.

Probability is a measure of the likeliness that an event will occur line_end
Probability is used to quantify an attitude of mind towards some proposition of    whose truth we are not certain line_end
file_end

second output ,very obscure

So my question is, what algorithm can work just fine here in this situation. What should I study to make such kind of program. Below is my C++ program.(I also did text processing using ruby to get the paragraph in current form with "line_end" and "file_end" but that's not where I am getting problem)

  #include<iostream>
  #include<algorithm>
  #include<vector>
  #include<set>
  #include<map>
  #include<string>
  #define MP(X,Y)  (make_pair<string,string>(X,Y))

  using namespace std;
  map<string, vector<string> > mind_map;
  set<string> ignore_these_words;
  set<pair<string,string> > already_discovered;

  string black_list[] = {"and","is","are","was","for","the","a","an","or","under","up","over","beside","below",
            "across","to","from","by","have","had","has","been","be","it","me","you"};
  vector<string> current_sentence;


  int main()
  {
    for(int i =0; i<(sizeof(black_list)/sizeof(black_list[0])) ; i++)
            ignore_these_words.insert(black_list[i] );


    while(1)
    {
    string input_word;
    cin >> input_word;

    if( ignore_these_words.find(input_word) != ignore_these_words.end() )
        continue;

    /* if  the sentence end has been reached, then insert all pairs of combinations  of words in the graph
       for example if the sentence is "roses are red and beautiful", then it will try to insert the following pairs of edges
       after ignoring "are" and "and" from the ignore list
       (roses,red)
       (roses,beautiful)
       (red,beautiful)
    */


    if(input_word == "line_end")
    {
        for(int i =0; i< current_sentence.size() ; i++)
            for(int j = i+1; j < current_sentence.size(); j++)
                /* if we have not discovered this connection earlier */
                if( already_discovered.find( MP(current_sentence[i],current_sentence[j]) ) == already_discovered.end() )
                    {
                        mind_map[current_sentence[i]].push_back( current_sentence[j]);
                        already_discovered.insert(MP(current_sentence[i],current_sentence[j]) );
                        already_discovered.insert(MP(current_sentence[j],current_sentence[i] ) );
                    }
        current_sentence.clear();
        continue;
    }


    /* if the file end has been reached, then output the graph in dot format */
     if( input_word == "file_end")
    {
        cout << "graph {"<<endl;
        for( map<string,vector<string> >::iterator it = mind_map.begin(); it != mind_map.end(); ++it)
            for( int i =0; i< (*it).second.size(); i++)
                cout<<"\""<<(*it).first<<"\""<<" -- "<<"\""<<(*it).second[i]<<"\""<<endl;
        cout<< "}"<<endl;
        break;
    }


    current_sentence.push_back(input_word);
    }
    return 0;
  }

Thanks in Advance :).And if someone have such kind of code, please give me.I want to make my study more productive by this.

While if might be somewhat crude to treat language like the Internet, I believe PageRank (used by Google's search engine) has some important similarities to what you are trying to do (create a map which demonstrates the relative importance).

Google's PageRank is based on giving each website a relative "importance". So when Website A has a link to Website B, B receives "importance" relative to A's importance. For example when a no-name website links to Wikipedia, Wikipedia gets a small importance boost, but if Wikipedia provides a link to another website, that website gets much more importance because of Wikipedia's large importance. There are many more nuances to PageRank, but this gives a taste.

Similarly, assigning "direction" to linking words would be like one website linking to another: "A is B" is A "linking" to B. One could say "roses are red" is like "roses" giving importance to "red". Since a lot of things "are red", the word "red" would gain a large amount of "importance"--just as common descriptive words such as "red" are semantically important to language. Hopefully this gives you an idea of a possible direction.

Recommended topics

Hot tags