EDIT: I am retaining the first part of my answer, because if this WASN'T a school assignment, this would be in my opinion still a better way to go about the task. I replace the second part of the answer with update matching OP's question.
What you appear to want to do is to create a query string parser, which would read the query string and translate it into a series of AND/OR/NOT combos to return the correct keys.
There are 2 approaches to this.
- According to what you wrote that you need, by far the simplest solution would be to load the data into any SQL database (SQLite, for example, which does not require a full-blown running SQL server), load dictionary keys as a separate field (the rest of your data may all be in a single another field, if you don't care about normal forms &c), and translate incoming queries to SQL, approximately like this:
SQL table has at least this:
CREATE TABLE my_data(
dictkey text,
data text);
python_query="foo OR bar AND NOT gazonk"
sql_keywords=["AND","NOT","OR"]
sql_query=[]
for word in python_query.split(" "):
if word in sql_keywords:
sql_query+=[ word ]
else:
sql_query+=["dictkey='%s'" % word]
real_sql_query=" ".join(sql_query)
This needs some escaping and control checking for SQL injections and special chars, but in general it would just translate your query into SQL, which, when run against the SQL datbase would return the keys (and possibly data) for further processing.
- Now for the pure Python version.
What you need to do is to analyze the string you get and apply the logic to your existing Python data.
Analyzing the string to reduce it to specific components (and their interactions) is parsing. If you actually wanted to build your own fully fledged parser, there would be Python modules for that, however, for a school assignment, I expect you are tasked to build your own.
From your description, the query can be expressed in quasi BNF form as:
(<[NOT] word> <AND|OR>)...
Since you say that priority of is not relevant all, you can do it the easy way and parse word by word.
Then you have to match the keywords to the filenames, which, as mentioned in another answer, is easiest to do with sets.
So, it could go approximately like this:
import re
query="foo OR bar AND NOT gazonk"
result_set=set()
operation=None
for word in re.split(" +(AND|OR) +",query):
#word will be in ['foo', 'OR', 'bar', 'AND', 'NOT gazonk']
inverted=False # for "NOT word" operations
if word in ['AND','OR']:
operation=word
continue
if word.find('NOT ') == 0:
if operation is 'OR':
# generally "OR NOT" operation does not make sense, but if it does in your case, you
# should update this if() accordingly
continue
inverted=True
# the word is inverted!
realword=word[4:]
else:
realword=word
if operation is not None:
# now we need to match the key and the filenames it contains:
current_set=set(inverted_index[realword].keys())
if operation is 'AND':
if inverted is True:
result_set -= current_set
else:
result_set &= current_set
elif operation is 'OR':
result_set |= current_set
operation=None
print result_set
Note that this is not a complete solution (for example it does not include dealing with the first term of the query, and it requires the boolean operators to be in uppercase), and is not tested. However, it should serve the primary purpose of showing you how to go about it. Doing more would be writing your course work for you, which would be bad for you. Because you are expected to learn how to do it so you can understand it. Feel free to ask for clarifications.