How to create a mind map from paragraph of a book
Asked Answered
U

1

6

Today I tried to write a program that would take in a paragraph of text and create a graph showing relations between different words. Everything went well, except that I don't know how to find out the connections in a better way. Better way means similar to a mind map.This is a simple input, but I want to create a program that can take a paragraph from wikipedia and give a very nice mind map. The graph that I got out of the dot format output of my program for the following input was

roses are red line_end
sky is blue line_end
life is beautiful line_end
everything is going fine line_end file_end

graphviz output

But for a input like this input, it just create a very big graph which is more obscure than the text itself.

Probability is a measure of the likeliness that an event will occur line_end
Probability is used to quantify an attitude of mind towards some proposition of    whose truth we are not certain line_end
file_end

second output ,very obscure

So my question is, what algorithm can work just fine here in this situation. What should I study to make such kind of program. Below is my C++ program.(I also did text processing using ruby to get the paragraph in current form with "line_end" and "file_end" but that's not where I am getting problem)

  #include<iostream>
  #include<algorithm>
  #include<vector>
  #include<set>
  #include<map>
  #include<string>
  #define MP(X,Y)  (make_pair<string,string>(X,Y))

  using namespace std;
  map<string, vector<string> > mind_map;
  set<string> ignore_these_words;
  set<pair<string,string> > already_discovered;

  string black_list[] = {"and","is","are","was","for","the","a","an","or","under","up","over","beside","below",
            "across","to","from","by","have","had","has","been","be","it","me","you"};
  vector<string> current_sentence;


  int main()
  {
    for(int i =0; i<(sizeof(black_list)/sizeof(black_list[0])) ; i++)
            ignore_these_words.insert(black_list[i] );


    while(1)
    {
    string input_word;
    cin >> input_word;

    if( ignore_these_words.find(input_word) != ignore_these_words.end() )
        continue;

    /* if  the sentence end has been reached, then insert all pairs of combinations  of words in the graph
       for example if the sentence is "roses are red and beautiful", then it will try to insert the following pairs of edges
       after ignoring "are" and "and" from the ignore list
       (roses,red)
       (roses,beautiful)
       (red,beautiful)
    */


    if(input_word == "line_end")
    {
        for(int i =0; i< current_sentence.size() ; i++)
            for(int j = i+1; j < current_sentence.size(); j++)
                /* if we have not discovered this connection earlier */
                if( already_discovered.find( MP(current_sentence[i],current_sentence[j]) ) == already_discovered.end() )
                    {
                        mind_map[current_sentence[i]].push_back( current_sentence[j]);
                        already_discovered.insert(MP(current_sentence[i],current_sentence[j]) );
                        already_discovered.insert(MP(current_sentence[j],current_sentence[i] ) );
                    }
        current_sentence.clear();
        continue;
    }


    /* if the file end has been reached, then output the graph in dot format */
     if( input_word == "file_end")
    {
        cout << "graph {"<<endl;
        for( map<string,vector<string> >::iterator it = mind_map.begin(); it != mind_map.end(); ++it)
            for( int i =0; i< (*it).second.size(); i++)
                cout<<"\""<<(*it).first<<"\""<<" -- "<<"\""<<(*it).second[i]<<"\""<<endl;
        cout<< "}"<<endl;
        break;
    }


    current_sentence.push_back(input_word);
    }
    return 0;
  }

Thanks in Advance :).And if someone have such kind of code, please give me.I want to make my study more productive by this.

Ullyot answered 30/4, 2014 at 17:40 Comment(5)
What is the problem with the code you posted? Does it not work? Does it throw an error? Does it produce incorrect results? Is it too slow? Is it missing some functionality?Boehmite
@Cyber Please look at the "so the question is" part just above the code...code is working fine. I want to know about better way to do it.Ullyot
Just some loose ideas: Your nodes and edges need to carry more data. The nodes should probably count how often the word occurs and the edges how often the word pair occurs in the same sentence so that you can delete "weak" nodes and edges from the graph after processing. You need a bigger stop-word list or even one that learns during processing. Lastly, stemming (i.e. reducing words to their stems by cutting off suffixes) will give you fewer, but hopefully more expressive nodes. And you will probably need a sufficietly big sample to get a decent mind map.Coffee
I think this boils down to "I want a program that can correctly discern the semantic meaning in fragments of the English language", which is a huge topic with all kinds of active research going on, the vast majority of which I have no clue about...Hastate
@Hastate Yeah, you can say that.Ullyot
A
0

While if might be somewhat crude to treat language like the Internet, I believe PageRank (used by Google's search engine) has some important similarities to what you are trying to do (create a map which demonstrates the relative importance).

Google's PageRank is based on giving each website a relative "importance". So when Website A has a link to Website B, B receives "importance" relative to A's importance. For example when a no-name website links to Wikipedia, Wikipedia gets a small importance boost, but if Wikipedia provides a link to another website, that website gets much more importance because of Wikipedia's large importance. There are many more nuances to PageRank, but this gives a taste.

Similarly, assigning "direction" to linking words would be like one website linking to another: "A is B" is A "linking" to B. One could say "roses are red" is like "roses" giving importance to "red". Since a lot of things "are red", the word "red" would gain a large amount of "importance"--just as common descriptive words such as "red" are semantically important to language. Hopefully this gives you an idea of a possible direction.

Alecalecia answered 6/6, 2014 at 18:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.