I am trying to implement a unranked boolean retrieval. For this, I need to construct a tree and perform a DFS to retrieve documents. I have the leaf nodes but I am having difficulty to construct the tree.
Eg: query = OR ( AND (maria sharapova) tennis)
Result:
OR | | AND tennis | | maria sharapova
I traverse the tree using DFS and calculate the boolean equivalent of certain document ids to identify the required document from the corpus. Can someone help me with the design of this using python? I have parsed the query and retrieved the leaf nodes for now.
EDIT: I am new here, so apologies for lacking clarity. I am basically trying to build a very naive search engine. So, the user enters any boolean query like: OR ( AND (maria sharapova) tennis). I have a corpus of wikipedia documents that gets displayed to the user depending on the query you type.
Till now, I have parsed the query to retrieve individual operators (like OR, AND, etc). And, the individual search terms(maria, tennis, etc). The code for parsing is just a function that would basically group all the operators and query terms as typed. i.e (maria sharapova), (tennis), OR, AND. I parsed this function this way so as to create a tree bottom-up. Now, using the inverted lists for the corresponding keywords like tennis, maria, sharapova, etc I perform the boolean operation with the inverted list to get a certain "documentid". This documentid is then passed to an API which would then retrieve the correct wikipedia page.
Just to explain the topic in more detail, please refer to this document for more information about my problem in hand: http://www.ccs.neu.edu/home/jaa/CSG339.06F/Lectures/boolean.pdf