How to sort a dictionary having keys as a string of numbers in Python
Asked Answered
R

6

11

I have a dictionary:

a = {'100':12,'6':5,'88':3,'test':34, '67':7,'1':64 }

I want to sort this dictionary with respect to key so it looks like:

a = {'1':64,'6':5,'67':7,'88':3, '100':12,'test':34 }
Reconnoiter answered 30/3, 2010 at 19:17 Comment(0)
B
15

Like everyone else has pointed out, dictionaries have their own ordering and you can't just sort them like you would a list.

One thing I would like to add is that, if you just want to go through the elements of a dictionary in sorted order, that's just:

for k in sorted(a):
    print k, a[k] # or whatever.

If you'd rather have a list comprehension (per Alex):

sortedlist = [(k, a[k]) for k in sorted(a)]

I would like to point out that Alex's use of key=int won't work with your example because one of your keys is 'test'. If you really want he numbers sorted before the non-numerics, you'll have to pass in a cmp function:

def _compare_keys(x, y):
    try:
        x = int(x)
    except ValueError:
        xint = False
    else:
        xint = True
    try:
        y = int(y)
    except ValueError:
        if xint:
            return -1
        return cmp(x.lower(), y.lower())
        # or cmp(x, y) if you want case sensitivity.
    else:
        if xint:
            return cmp(x, y)
        return 1

for k in sorted(a, cmp=_compare_keys):
    print k, a[k] # or whatever.

Or maybe you know enough about your keys to write a function to convert them into a string (or other object) that sorts right:

# Won't work for integers with more than this many digits, or negative integers.
MAX_DIGITS = 10
def _keyify(x):
    try:
        xi = int(x)
    except ValueError:
        return 'S{0}'.format(x)
    else:
        return 'I{0:0{1}}'.format(xi, MAX_DIGITS)

for k in sorted(a, key=_keyify):
    print k, a[k] # or whatever.

This would be much faster than using a cmp function.

Bowser answered 30/3, 2010 at 19:58 Comment(4)
I don't really like this as much as the other solutions. _keyify is unnecessarily rigid and I don't think quite as clean as some of the other solutions and your _compare_keys uses cmp, which is usually a bad sign. (It's also good form to keep try blocks for try/except as small as possible. I would put return 'I{0:0{1}}'.format(xi, MAX_DIGITS) in an else block.)Solander
No, the solutions aren't optimized, they're just there as examples or starting points. (It's hard to suggest a better solution without knowing what the actual key domain is.) I like your solution except for the part where it breaks in Python 3. I originally wrote _compare_keys with two two-line try blocks, but it was twice as long. Using the try blocks to control execution flow eliminated the need for a yint boolean.Bowser
I'm assuming that you mean that using the cmp parameter, as opposed to the function, is a "bad sign", and yes it is, but it's brought on by the incompletely defined key domain. That's why I followed up with the _keyify solution. Even better would be to create a new class for keys that can take a string input and sort itself properly, and use it as the dictionary's key instead, but I don't know if ben has that much control over the input dictionary. And, yeah, I forgot about else blocks; too much C++ recently. Fixed.Bowser
Thinking about it, normally I'd use re to take care of figuring out if something was an int or not, and not try: x = int(x) ..., but I felt that would be distracting from the main point of "use key= or cmp=", and at the risk of sounding like a broken sound card, I couldn't suggest a proper regular expression without knowing the key domain.Bowser
Y
7

You cannot sort a dict in Python as the dict type is inherently unordered. What you can do is sort the items before you used them using the sorted() built in function. You will also need a helper function to distinguish between your numerical and string keys:

def get_key(key):
    try:
        return int(key)
    except ValueError:
        return key
a = {'100':12,'6':5,'88':3,'test':34, '67':7,'1':64 }
print sorted(a.items(), key=lambda t: get_key(t[0]))

However in Python 3.1 (and 2.7) the collections module contains the collections.OrderedDicttype that can be used to achieve the effect you want like below:

def get_key(key):
    try:
        return int(key)
    except ValueError:
        return key
a = {'100':12,'6':5,'88':3,'test':34, '67':7,'1':64 }
b = collections.OrderedDict(sorted(a.items(), key=lambda t: get_key(t[0])))
print(b)
Yetac answered 30/3, 2010 at 19:27 Comment(4)
This will not get the order OP wanted.Solander
Hi Mawushe, First let me thank you for your suggestion. But still the given code won't provide a desired solution.Reconnoiter
You are both right gents I have update the answer to both sort by the key and take into account the string nature of the keys.Yetac
OrderedDict is also available in Python 2.7…Dunedin
I
5

9 years ago I posted a recipe that starts

Dictionaries can't be sorted -- a mapping has no ordering!

and shows how to get sorted lists out of a dict's keys and values.

With today's Python, and your expressed-plus-implied specs, I'd suggest:

import sys

def asint(s):
    try: return int(s), ''
    except ValueError: return sys.maxint, s

sortedlist = [(k, a[k]) for k in sorted(a, key=asint)]

the key=asint is what tells sorted to treat those string keys as integers for sorting purposes, so that e.g. '2' sorts between '1' and '12', rather than after them both -- that's what you appear to require, as well as having all non-all-digits keys sort after all all-digits ones. If you need to also deal with all-digits key strings that express integers larger than sys.maxint, it's a bit trickier, but still doable:

class Infinity(object):
    def __cmp__(self, other): return 0 if self is other else 1
infinite = Infinity()
def asint(s):
    try: return int(s), ''
    except ValueError: return infinite, s

In general, you can get better answers faster if you specify your exact requirements more precisely from the start;-).

Inadmissible answered 30/3, 2010 at 19:30 Comment(3)
This will fail for the input OP specified because int will raise ValueError when it encounters 'test'.Solander
@ben, you're welcome, but SO's motto is "thanks are nice, but what really matters is the accept" -- once you've seen enough answers, accept the answer that helped you most, by clicking on the checkmark-shaped icon under the number at the answer's top left. This also gets you started on earning your own rep on SO, as well as rewarding with rep the answerer who has most helped you.Inadmissible
Note that float('inf') is already a number greater than all others.Solander
S
4

Dictionaries are unordered. You cannot sort one like you show because the resultant a is a dict, and dicts do not have order.

If you want, say, a list a list of the keys in sorted order, you can use code like

>>> def my_key(dict_key):
...     try:
...         return int(dict_key)
...     except ValueError:
...         return dict_key
...
>>> sorted(a, key=my_key)
['1', '6', '67', '88', '100', 'test']

This relies on the stupid Python behavior that instances of str are always greater than instances of int. (The behaviour is fixed in Python 3.) In an optimal design, the keys of your dict would be things you could compare sanely and you wouldn't mix in strings representing numbers with strings representing words.

If you want to keep the keys in always-sorted order, you can use the bisect module or implement a mapping that relies on a tree data structure. The bisect module does not accept a key argument like the sorting stuff because this would be potentially inefficient; you would use the decorate–use–undecorate pattern if you chose to use bisect, keeping a sorted list that depends on the result of the key function.

Solander answered 30/3, 2010 at 19:23 Comment(2)
Note that maintaining a sorted list using bisect takes O(n**2) time. The bisect will find the right place to insert in O(log n) time per item, but the insert still takes O(n) time per item due to the array-like nature of Python lists.Chaperone
@Daniel Strutzbach, That depends on what you mean by "maintaining"; depending on usage you often can avoid the quadratic behavior of building a sorted list using insort. Using a list and bisect certainly does not give the ideal performance for insertion operations. If these operations are important, it is better to go with the "or implement a mapping that relies on a tree data structure". Since you've already done this for us, I upvoted your post which references your nice sorteddict implementation.Solander
C
1

If you install my blist package, it includes a sorteddict type. Then you could simply:

from blist import sorteddict

def my_key(dict_key):
       try:
              return int(dict_key)
       except ValueError:
              return dict_key

a = {'100':12,'6':5,'88':3,'test':34, '67':7,'1':64 }
print sorteddict(my_key, **a).keys()

Output:

['1', '6', '67', '88', '100', 'test']
Chaperone answered 30/3, 2010 at 19:39 Comment(1)
7-space indents aside ;), this is a good solution if you need a mapping that is always sorted.Solander
S
0

what you could do is also the following:

sorted_keys = sorted(list(a.keys()), key = lambda x: (len(x),x))
sorted_dict = {k:a[k] for k in sorted_keys}

The most important part is key = lambda x: (len(x),x) which I took from here

Storied answered 20/5, 2021 at 15:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.