Today I tried to write a program that would take in a paragraph of text and create a graph showing relations between different words. Everything went well, except that I don't know how to find out the connections in a better way. Better way means similar to a mind map.This is a simple input, but I want to create a program that can take a paragraph from wikipedia and give a very nice mind map. The graph that I got out of the dot format output of my program for the following input was
roses are red line_end
sky is blue line_end
life is beautiful line_end
everything is going fine line_end file_end
But for a input like this input, it just create a very big graph which is more obscure than the text itself.
Probability is a measure of the likeliness that an event will occur line_end
Probability is used to quantify an attitude of mind towards some proposition of whose truth we are not certain line_end
file_end
So my question is, what algorithm can work just fine here in this situation. What should I study to make such kind of program. Below is my C++ program.(I also did text processing using ruby to get the paragraph in current form with "line_end" and "file_end" but that's not where I am getting problem)
#include<iostream>
#include<algorithm>
#include<vector>
#include<set>
#include<map>
#include<string>
#define MP(X,Y) (make_pair<string,string>(X,Y))
using namespace std;
map<string, vector<string> > mind_map;
set<string> ignore_these_words;
set<pair<string,string> > already_discovered;
string black_list[] = {"and","is","are","was","for","the","a","an","or","under","up","over","beside","below",
"across","to","from","by","have","had","has","been","be","it","me","you"};
vector<string> current_sentence;
int main()
{
for(int i =0; i<(sizeof(black_list)/sizeof(black_list[0])) ; i++)
ignore_these_words.insert(black_list[i] );
while(1)
{
string input_word;
cin >> input_word;
if( ignore_these_words.find(input_word) != ignore_these_words.end() )
continue;
/* if the sentence end has been reached, then insert all pairs of combinations of words in the graph
for example if the sentence is "roses are red and beautiful", then it will try to insert the following pairs of edges
after ignoring "are" and "and" from the ignore list
(roses,red)
(roses,beautiful)
(red,beautiful)
*/
if(input_word == "line_end")
{
for(int i =0; i< current_sentence.size() ; i++)
for(int j = i+1; j < current_sentence.size(); j++)
/* if we have not discovered this connection earlier */
if( already_discovered.find( MP(current_sentence[i],current_sentence[j]) ) == already_discovered.end() )
{
mind_map[current_sentence[i]].push_back( current_sentence[j]);
already_discovered.insert(MP(current_sentence[i],current_sentence[j]) );
already_discovered.insert(MP(current_sentence[j],current_sentence[i] ) );
}
current_sentence.clear();
continue;
}
/* if the file end has been reached, then output the graph in dot format */
if( input_word == "file_end")
{
cout << "graph {"<<endl;
for( map<string,vector<string> >::iterator it = mind_map.begin(); it != mind_map.end(); ++it)
for( int i =0; i< (*it).second.size(); i++)
cout<<"\""<<(*it).first<<"\""<<" -- "<<"\""<<(*it).second[i]<<"\""<<endl;
cout<< "}"<<endl;
break;
}
current_sentence.push_back(input_word);
}
return 0;
}
Thanks in Advance :).And if someone have such kind of code, please give me.I want to make my study more productive by this.