return output of dictionary to alphabetical order
Asked Answered
T

2

19

The following code prints out the word in the txt file and then how many instances there are of that word (e.g. a, 26) the problem is that it doesn't print it out in alphabetical order. Any help would be much appreciated

import re
def print_word_counts(filename):
    s=open(filename).read()
    words=re.findall('[a-zA-Z]+', s)
    e=[x.lower() for x in (words)]
    e.sort()
    from collections import Counter
    dic=Counter(e)
    for key,value in dic.items():
        print (key,value)
print_word_counts('engltreaty.txt')
Tocopherol answered 17/5, 2013 at 1:44 Comment(0)
G
43

You just need to sort the items. The builtin sorted should work wonderfully:

for key,value in sorted(dic.items()):
    ...

If you drop the e.sort() line, then this should run in approximately the same amount of time. The reason that it doesn't work is because dictionaries are based on hash tables which store items in order of their hash values (with some more complicated stuff when hash collisions occur). Since the hashing function is never specified anywhere, it means that you can't count on a dictionary keeping any order that you try to give it and that the order is implementation and version dependent. For other simple cases, the collections module has an OrderedDict subclass which does keep insertion order. however, that won't really help you here.

Geminius answered 17/5, 2013 at 1:45 Comment(3)
Just this: sorted(dic.items()) worked for me, thanks.Unbalance
what i dont just want the keys, but the whole key value pairs sorted alphabetically?Chomp
@AkinHwan -- I'm not sure I understand the question... The items() is an iterable of 2-tuples (key-value pairs). The 2-tuples will sort lexicographically. This will sort on keys first (and in the event of a tie, then the value will be compared). Of course, since this is a dict, our keys will be unique ... I'm not sure what you mean by "the whole key-value pairs sorted alphabetically", Maybe try sorted(dic.items(), key=lambda x: x[0] + x[1])?Geminius
T
0

Note Counter is a subclass of dict so sorting before you add to Counter:

e.sort()
dic=Counter(e)

won't achieve order.

import re
from collections import Counter

def print_word_counts(filename):
    c = Counter()
    with open(filename) as f: # with block closes file at the end of the block
        for line in f: # go line by line, don't load it all into mem at once
            c.update(w.lower() for w in re.findall('[a-zA-Z]+', line))

    for k, v in sorted(c.items()): # sorts
        print k, v

print_word_counts('engltreaty.txt')
Truehearted answered 17/5, 2013 at 1:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.