Creating Lexicon and Scanner in Python
Asked Answered
L

7

5

I'm new here in the world of coding and I haven't received a very warm welcome. I've been trying to learn python via the online tutorial http://learnpythonthehardway.org/book/. I've been able to struggle my way through the book up until exercise 48 & 49. That's where he turns students loose and says "You figure it out." But I simply can't. I understand that I need to create a Lexicon of possible words and that I need to scan the user input to see if it matches anything in the Lexicon but that's about it! From what I can tell, I need to create a list called lexicon:

lexicon = [
    ('directions', 'north'),
    ('directions', 'south'),
    ('directions', 'east'),
    ('directions', 'west'),
    ('verbs', 'go'),
    ('verbs', 'stop'),
    ('verbs', 'look'),
    ('verbs', 'give'),
    ('stops', 'the'),
    ('stops', 'in'),
    ('stops', 'of'),
    ('stops', 'from'),
    ('stops', 'at')
]

Is that right? I don't know what to do next? I know that each item in the list is called a tuple, but that doesn't really mean anything to me. How do I take raw input and assign it to the tuple? You know what I mean? So in exercise 49 he imports the lexicon and just inside python prints lexicon.scan("input") and it returns the list of tuples so for example:

from ex48 import lexicon
>>> print lexicon.scan("go north")
[('verb', 'go'), ('direction', 'north')]

Is 'scan()' a predefined function or did he create the function within the lexicon module? I know that if you use 'split()' it creates a list with all of the words from the input but then how does it assign 'go' to the tuple ('verb', 'go')?

Am I just way off? I know I'm asking a lot but I searched around everywhere for hours and I can't figure this one out on my own. Please help! I will love you forever!

Lucrative answered 15/3, 2013 at 4:42 Comment(1)
Just as an advice — don't expect a warm welcome from the world of coding. It's a rookie mistake. If you really like it, you wouldn't care for how warm you were welcomed.Garbage
B
3

I wouldn't use a list to make the lexicon. You're mapping words to their types, so make a dictionary.

Here's the biggest hint that I can give without writing the entire thing:

lexicon = {
    'north': 'directions',
    'south': 'directions',
    'east': 'directions',
    'west': 'directions',
    'go': 'verbs',
    'stop': 'verbs',
    'look': 'verbs',
    'give': 'verbs',
    'the': 'stops',
    'in': 'stops',
    'of': 'stops',
    'from': 'stops',
    'at': 'stops'
}

def scan(sentence):
    words = sentence.lower().split()
    pairs = []

    # Iterate over `words`,
    # pull each word and its corresponding type
    # out of the `lexicon` dictionary and append the tuple
    # to the `pairs` list

    return pairs
Brawl answered 15/3, 2013 at 4:49 Comment(11)
How do I pull them out of the dictionary? I tried like lexicon.items() But that only gives 3 out of the 13 pairs, and it doesn't only give the ones from the input. I have no idea what I am doing. I'm sorry.Lucrative
@Zaqory: That's covered in exercise 39.Brawl
lexicon.items() returns the tuples from the entire dictionary but 'items' requires exactly 0 arguments so to get only the tuple that I want, I can't just put lexicon.items('go') to get [('go', 'verbs')] how do I only get called-for tuples?Lucrative
@Zaqory: You already have the first part of the tuple. The second part of the tuple is just the corresponding value in the dictionary.Brawl
but i can't get it to just give me the single tuple that I ask forLucrative
@Zaqory: Make the tuple yourself: (word, word_type)Brawl
so for word, word_type in words: pairs.append(word, word_type)?Lucrative
This is what I have that I feel like should work:lexicon = { 'north': 'directions', 'south': 'directions', 'east': 'directions', 'west': 'directions', 'go': 'verbs', 'stop': 'verbs', 'look': 'verbs', 'give': 'verbs', 'the': 'stops', 'in': 'stops', 'of': 'stops', 'from': 'stops', 'at': 'stops' } a = raw_input("> ") s = a.lower().split() pairs = [] for tupes, things in s: word = things word_type = lexicon[word] tupes = (word, word_type) pairs.append(tupes)Lucrative
@Zaqory: Almost. for tuples, things in s doesn't really make sense, as s is just a list of values. You also need to take that raw_input out of there (the sentence is being passed as an argument), return the list of tuples, and it should work.Brawl
what should I do for the loop then?Lucrative
@Zaqory: If you're iterating over the words in the sentence, shouldn't it just be for word in sentence: ...?Brawl
H
3

Based on the ex48 instructions, you could create a few lists for each kind of word. Here's a sample for the first test case. The returned value is a list of tuples, so you can append to that list for each word given.

direction = ['north', 'south', 'east', 'west', 'down', 'up', 'left', 'right', 'back']

class Lexicon:
    def scan(self, sentence):
        self.sentence = sentence
        self.words = sentence.split()
        stuff = []
        for word in self.words:
            if word in direction:
                stuff.append(('direction', word))
        return stuff

lexicon = Lexicon()

He notes that numbers and exceptions are handled differently.

Homosporous answered 8/5, 2014 at 2:35 Comment(0)
L
2

Finally I did it!

lexicon = {
    ('directions', 'north'),
    ('directions', 'south'),
    ('directions', 'east'),
    ('directions', 'west'),
    ('verbs', 'go'),
    ('verbs', 'stop'),
    ('verbs', 'look'),
    ('verbs', 'give'),
    ('stops', 'the'),
    ('stops', 'in'),
    ('stops', 'of'),
    ('stops', 'from'),
    ('stops', 'at')
    }

def scan(sentence):

    words = sentence.lower().split()
    pairs = []

    for word in words:
        word_type = lexicon[word]
        tupes = (word, word_type) 
        pairs.append(tupes)

    return pairs
Lucrative answered 20/3, 2013 at 2:4 Comment(0)
C
2

This is a really cool exercise. I had to research for days and finally got it working. The other answers here don't show how to actually use a list with tuples inside like the e-book sugests, so this will do it like that. Owner's answer doesn't quite work, lexicon[word] asks for interger and not str.

lexicon = [('direction', 'north', 'south', 'east', 'west'),
           ('verb', 'go', 'kill', 'eat'),
           ('nouns', 'princess', 'bear')]
def scan():
    stuff = raw_input('> ')
    words = stuff.split()
    pairs = []

    for word in words:

        if word in lexicon[0]:
            pairs.append(('direction', word))
        elif word in lexicon[1]:
            pairs.append(('verb', word))
        elif word in lexicon[2]:
            pairs.append(('nouns', word))
        else: 
            pairs.append(('error', word))

    print pairs

Cheers!

Cowpox answered 3/1, 2016 at 22:56 Comment(0)
A
1

clearly Lexicon is another python file in ex48 folder.

like: ex48
      ----lexicon.py

so you are importing lexicon.py from ex 48 folder.

scan is a function inside lexicon.py

Amphisbaena answered 27/10, 2017 at 16:30 Comment(0)
I
1

Like the most here I am new to the world of coding and I though I attach my solution below as it might help other students.

I already saw a few more efficient approaches that I could implement. However, the code handles every use case of the exercise and since I am wrote it on my own with my beginners mind it does not take complicated shortcuts and should be very easy to understand for other beginners.

I therefore thought it might beneficial for someone else learning. Let me know what you think. Cheers!

class Lexicon(object): 

def __init__(self):
    self.sentence = []
    self.dictionary = {
        'north' : ('direction','north'),
        'south' : ('direction','south'),
        'east' : ('direction','east'),
        'west' : ('direction','west'),
        'down' : ('direction','down'),
        'up' : ('direction','up'),
        'left' : ('direction','left'),
        'right' : ('direction','right'),
        'back' : ('direction','back'),
        'go' : ('verb','go'),
        'stop' : ('verb','stop'),
        'kill' : ('verb','kill'),
        'eat' : ('verb', 'eat'),
        'the' : ('stop','the'),
        'in' : ('stop','in'),
        'of' : ('stop','of'),
        'from' : ('stop','from'),
        'at' : ('stop','at'),
        'it' : ('stop','it'),
        'door' : ('noun','door'),
        'bear' : ('noun','bear'),
        'princess' : ('noun','princess'),
        'cabinet' : ('noun','cabinet'),
    }

def scan(self, input):
    loaded_imput = input.split()
    self.sentence.clear()

    for item in loaded_imput:
        try:
            int(item)
            number = ('number', int(item))
            self.sentence.append(number)
        except ValueError:
            word = self.dictionary.get(item.lower(), ('error', item))
            self.sentence.append(word)

    return self.sentence
lexicon = Lexicon()
Iatrochemistry answered 2/6, 2020 at 19:56 Comment(1)
Thank you for acknowledging the other solutions, and highlighting what differentiates your approach. That's much appreciated when answering an old question with a lot of existing answers. There's definitely value to providing answers that don't rely as much on more sophisticated syntactical shortcuts that trip up people new to the language.Rosales
S
-1

This is my version of scanning lexicon for ex48. I am also beginner in programming, python is my first language. So the program may not be efficient for its purpose, anyway the result is good after many testing. Please feel free to improve the code.

WARNING

If you haven't try to do the exercise by your own, I encourage you to try without looking into any example.

WARNING

One thing I love about programming is that, every time I encounter some problem, I spend a lot of time trying different method to solve the problem. I spend over few weeks trying to create structure, and it is really rewarding as a beginner that I really learn a lot instead of copying from other.

Below is my lexicon and search in one file.

direction = [('direction', 'north'),
            ('direction', 'south'),
            ('direction', 'east'),
            ('direction', 'west'),
            ('direction', 'up'),
            ('direction', 'down'),
            ('direction', 'left'),
            ('direction', 'right'),
            ('direction', 'back')
]

verbs = [('verb', 'go'),
        ('verb', 'stop'),
        ('verb', 'kill'),
        ('verb', 'eat')
]

stop_words = [('stop', 'the'),
            ('stop', 'in'),
            ('stop', 'of'),
            ('stop', 'from'),
            ('stop', 'at'),
            ('stop', 'it')
]

nouns = [('noun', 'door'),
        ('noun', 'bear'),
        ('noun', 'princess'),
        ('noun', 'cabinet')
]   

library = tuple(nouns + stop_words + verbs + direction)

#below is the search method with explanation.

def convert_number(x):
try:
    return int(x)
except ValueError:
    return None


def scan(input):
#include uppercase input for searching. (Study Drills no.3)
lowercase = input.lower()
#element is what i want to search.
element = lowercase.split()
#orielement is the original input which have uppercase, for 'error' type
orielement = input.split()
#library is tuple of the word types from above. You can replace with your data source.
data = library
#i is used to evaluate the position of element
i = 0
#z is used to indicate the position of output, which is the data that match what i search, equals to "i".
z = 0
#create a place to store my output.
output = []
#temp is just a on/off switch. Turn off the switch when i get any match for that particular input.
temp = True
#creating a condition which evaluates the total search needed to be done and follows the sequence by +1.
while not(i == len(element)):
    try:
        #j is used to position the word in the library, eg 'door', 'bear', 'go', etc which exclude the word type.
        j = 0
        while not (j == len(data)):
            #data[j][1] all the single word in library
            matching = data[j][1]
            #when the word match, it will save the match into the output.
            if (matching == element[i]):
                output.append(data[j])
                #print output[z]
                j += 1
                z += 1
                #to switch off the search for else: below and go to next input search. Otherwise they would be considerd 'error'
                temp = False
            #else is everything that is not in the library.
            else:
                while (data[j][1] == data [-1][1]) and (temp == True):
                    #refer to convert_number, to test if the input is a number, here i use orielement which includes uppercase
                    convert = convert_number(orielement[i])
                    #a is used to save number only.
                    a = tuple(['number', convert])
                    #b is to save everything
                    b = tuple(['error', orielement[i]])
                    #conver is number a[1] is the access the number inside, if it returns None from number then it wont append. 
                    if convert == a[1] and not(convert == None):    
                        output.append(a)
                        temp = False
                    else:
                        output.append(b)
                        #keep the switch off to escape the while loop!
                        temp = False
                #searching in next data
                j += 1
        #next word of input
        i += 1
        temp = True
    except ValueError:
        return output
else:
    pass
return output
Serrano answered 24/12, 2018 at 2:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.