Python pretty print dictionary of lists, abbreviate long lists
Asked Answered
O

5

14

I have a dictionary of lists and the lists are quite long. How can I print it in a way that only a few elements of the list show up? Obviously, I can write a custom function for that but is there any built-in way or library that can achieve this? For example when printing large data frames, pandas prints it nicely in a short way.

This example better illustrates what I mean:

obj = {'key_1': ['EG8XYD9FVN',
  'S2WARDCVAO',
  'J00YCU55DP',
  'R07BUIF2F7',
  'VGPS1JD0UM',
  'WL3TWSDP8E',
  'LD8QY7DMJ3',
  'J36U3Z9KOQ',
  'KU2FUGYB2U',
  'JF3RQ315BY'],
 'key_2': ['162LO154PM',
  '3ROAV881V2',
  'I4T79LP18J',
  'WBD36EM6QL',
  'DEIODVQU46',
  'KWSJA5WDKQ',
  'WX9SVRFO0G',
  '6UN63WU64G',
  '3Z89U7XM60',
  '167CYON6YN']}

Desired output: something like this:

{'key_1':
    ['EG8XYD9FVN', 'S2WARDCVAO', '...'],
 'key_2':
    ['162LO154PM', '3ROAV881V2', '...']
}
Orjonikidze answered 22/7, 2016 at 18:38 Comment(5)
You may also want to take a look at the repr module (reprlib in Py3), particularly the maxlist setting: rep = Repr(); rep.maxlist = 3; print rep.repr(obj) Now combining that with pprint is the real challenge.Toland
Awesome! This is the best answer and exactly what I needed. Could you add as answer? @LukasGrafOrjonikidze
It seems Michael Hoff is already working on a reprlib answer, so I just upvoted that :)Toland
That doesn't abbreviate! Instantiating the Repr class and setting maxlist is the key!Orjonikidze
It does, it's just that the default of maxlist is 6, so it's less obvious. Also note that __builtins__.repr() is not the same as repr.repr() (yeah, the naming really isn't helping in Python 2.x).Toland
T
7

If it weren't for the pretty printing, the reprlib module would be the way to go: Safe, elegant and customizable handling of deeply nested and recursive / self-referencing data structures is what it has been made for.

However, it turns out combining the reprlib and pprint modules isn't trivial, at least I couldn't come up with a clean way without breaking (some) of the pretty printing aspects.

So instead, here's a solution that just subclasses PrettyPrinter to crop / abbreviate lists as necessary:

from pprint import PrettyPrinter


obj = {
    'key_1': [
        'EG8XYD9FVN', 'S2WARDCVAO', 'J00YCU55DP', 'R07BUIF2F7', 'VGPS1JD0UM',
        'WL3TWSDP8E', 'LD8QY7DMJ3', 'J36U3Z9KOQ', 'KU2FUGYB2U', 'JF3RQ315BY',
    ],
    'key_2': [
        '162LO154PM', '3ROAV881V2', 'I4T79LP18J', 'WBD36EM6QL', 'DEIODVQU46',
        'KWSJA5WDKQ', 'WX9SVRFO0G', '6UN63WU64G', '3Z89U7XM60', '167CYON6YN',
    ],
    # Test case to make sure we didn't break handling of recursive structures
    'key_3': [
        '162LO154PM', '3ROAV881V2', [1, 2, ['a', 'b', 'c'], 3, 4, 5, 6, 7],
        'KWSJA5WDKQ', 'WX9SVRFO0G', '6UN63WU64G', '3Z89U7XM60', '167CYON6YN',
    ]
}


class CroppingPrettyPrinter(PrettyPrinter):

    def __init__(self, *args, **kwargs):
        self.maxlist = kwargs.pop('maxlist', 6)
        return PrettyPrinter.__init__(self, *args, **kwargs)

    def _format(self, obj, stream, indent, allowance, context, level):
        if isinstance(obj, list):
            # If object is a list, crop a copy of it according to self.maxlist
            # and append an ellipsis
            if len(obj) > self.maxlist:
                cropped_obj = obj[:self.maxlist] + ['...']
                return PrettyPrinter._format(
                    self, cropped_obj, stream, indent,
                    allowance, context, level)

        # Let the original implementation handle anything else
        # Note: No use of super() because PrettyPrinter is an old-style class
        return PrettyPrinter._format(
            self, obj, stream, indent, allowance, context, level)


p = CroppingPrettyPrinter(maxlist=3)
p.pprint(obj)

Output with maxlist=3:

{'key_1': ['EG8XYD9FVN', 'S2WARDCVAO', 'J00YCU55DP', '...'],
 'key_2': ['162LO154PM',
           '3ROAV881V2',
           [1, 2, ['a', 'b', 'c'], '...'],
           '...']}

Output with maxlist=5 (triggers splitting the lists on separate lines):

{'key_1': ['EG8XYD9FVN',
           'S2WARDCVAO',
           'J00YCU55DP',
           'R07BUIF2F7',
           'VGPS1JD0UM',
           '...'],
 'key_2': ['162LO154PM',
           '3ROAV881V2',
           'I4T79LP18J',
           'WBD36EM6QL',
           'DEIODVQU46',
           '...'],
 'key_3': ['162LO154PM',
           '3ROAV881V2',
           [1, 2, ['a', 'b', 'c'], 3, 4, '...'],
           'KWSJA5WDKQ',
           'WX9SVRFO0G',
           '...']}

Notes:

  • This will create copies of lists. Depending on the size of the data structures, this can be very expensive in terms of memory use.
  • This only deals with the special case of lists. Equivalent behavior would have to be implemented for dicts, tuples, sets, frozensets, ... for this class to be of general use.
Toland answered 22/7, 2016 at 20:8 Comment(1)
Funny, I had a similar prototype. But I disliked the steps you had to undertake to implement the truncation of dictionaries. +1 for the effort!Paeon
P
12

You could use IPython.lib.pretty.

from IPython.lib.pretty import pprint

> pprint(obj, max_seq_length=5)
{'key_1': ['EG8XYD9FVN',
  'S2WARDCVAO',
  'J00YCU55DP',
  'R07BUIF2F7',
  'VGPS1JD0UM',
  ...],
 'key_2': ['162LO154PM',
  '3ROAV881V2',
  'I4T79LP18J',
  'WBD36EM6QL',
  'DEIODVQU46',
  ...]}

> pprint(dict(map(lambda i: (i, range(i + 5)), range(100))), max_seq_length=10)
{0: [0, 1, 2, 3, 4],
 1: [0, 1, 2, 3, 4, 5],
 2: [0, 1, 2, 3, 4, 5, 6],
 3: [0, 1, 2, 3, 4, 5, 6, 7],
 4: [0, 1, 2, 3, 4, 5, 6, 7, 8],
 5: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 6: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...],
 7: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...],
 8: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...],
 9: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...],
 ...}

For older versions of IPython, you might exploit RepresentationPrinter:

from IPython.lib.pretty import RepresentationPrinter
import sys

def compact_pprint(obj, max_seq_length=10):
    printer = RepresentationPrinter(sys.stdout)
    printer.max_seq_length = max_seq_length
    printer.pretty(obj)
    printer.flush()
Paeon answered 22/7, 2016 at 20:14 Comment(1)
Probably because there are some jerks on the internetCorcyra
B
7

You could use the pprint module:

pprint.pprint(obj)

Would output:

{'key_1': ['EG8XYD9FVN',
           'S2WARDCVAO',
           'J00YCU55DP',
           'R07BUIF2F7',
           'VGPS1JD0UM',
           'WL3TWSDP8E',
           'LD8QY7DMJ3',
           'J36U3Z9KOQ',
           'KU2FUGYB2U',
           'JF3RQ315BY'],
 'key_2': ['162LO154PM',
           '3ROAV881V2',
           'I4T79LP18J',
           'WBD36EM6QL',
           'DEIODVQU46',
           'KWSJA5WDKQ',
           'WX9SVRFO0G',
           '6UN63WU64G',
           '3Z89U7XM60',
           '167CYON6YN']}

And,

pprint.pprint(obj,depth=1)

Would output:

{'key_1': [...], 'key_2': [...]}

And,

pprint.pprint(obj,compact=True)

would output:

{'key_1': ['EG8XYD9FVN', 'S2WARDCVAO', 'J00YCU55DP', 'R07BUIF2F7',
           'VGPS1JD0UM', 'WL3TWSDP8E', 'LD8QY7DMJ3', 'J36U3Z9KOQ',
           'KU2FUGYB2U', 'JF3RQ315BY'],
 'key_2': ['162LO154PM', '3ROAV881V2', 'I4T79LP18J', 'WBD36EM6QL',
           'DEIODVQU46', 'KWSJA5WDKQ', 'WX9SVRFO0G', '6UN63WU64G',
           '3Z89U7XM60', '167CYON6YN']}
Beggs answered 22/7, 2016 at 18:47 Comment(1)
pprint.pprint(obj,depth=1) is the closest thing to what I need. I guess there is not a built in way to print only the first element of the list and then put ... for the rest. like {'key_': ['EG8XYD9FVN', ...]Orjonikidze
T
7

If it weren't for the pretty printing, the reprlib module would be the way to go: Safe, elegant and customizable handling of deeply nested and recursive / self-referencing data structures is what it has been made for.

However, it turns out combining the reprlib and pprint modules isn't trivial, at least I couldn't come up with a clean way without breaking (some) of the pretty printing aspects.

So instead, here's a solution that just subclasses PrettyPrinter to crop / abbreviate lists as necessary:

from pprint import PrettyPrinter


obj = {
    'key_1': [
        'EG8XYD9FVN', 'S2WARDCVAO', 'J00YCU55DP', 'R07BUIF2F7', 'VGPS1JD0UM',
        'WL3TWSDP8E', 'LD8QY7DMJ3', 'J36U3Z9KOQ', 'KU2FUGYB2U', 'JF3RQ315BY',
    ],
    'key_2': [
        '162LO154PM', '3ROAV881V2', 'I4T79LP18J', 'WBD36EM6QL', 'DEIODVQU46',
        'KWSJA5WDKQ', 'WX9SVRFO0G', '6UN63WU64G', '3Z89U7XM60', '167CYON6YN',
    ],
    # Test case to make sure we didn't break handling of recursive structures
    'key_3': [
        '162LO154PM', '3ROAV881V2', [1, 2, ['a', 'b', 'c'], 3, 4, 5, 6, 7],
        'KWSJA5WDKQ', 'WX9SVRFO0G', '6UN63WU64G', '3Z89U7XM60', '167CYON6YN',
    ]
}


class CroppingPrettyPrinter(PrettyPrinter):

    def __init__(self, *args, **kwargs):
        self.maxlist = kwargs.pop('maxlist', 6)
        return PrettyPrinter.__init__(self, *args, **kwargs)

    def _format(self, obj, stream, indent, allowance, context, level):
        if isinstance(obj, list):
            # If object is a list, crop a copy of it according to self.maxlist
            # and append an ellipsis
            if len(obj) > self.maxlist:
                cropped_obj = obj[:self.maxlist] + ['...']
                return PrettyPrinter._format(
                    self, cropped_obj, stream, indent,
                    allowance, context, level)

        # Let the original implementation handle anything else
        # Note: No use of super() because PrettyPrinter is an old-style class
        return PrettyPrinter._format(
            self, obj, stream, indent, allowance, context, level)


p = CroppingPrettyPrinter(maxlist=3)
p.pprint(obj)

Output with maxlist=3:

{'key_1': ['EG8XYD9FVN', 'S2WARDCVAO', 'J00YCU55DP', '...'],
 'key_2': ['162LO154PM',
           '3ROAV881V2',
           [1, 2, ['a', 'b', 'c'], '...'],
           '...']}

Output with maxlist=5 (triggers splitting the lists on separate lines):

{'key_1': ['EG8XYD9FVN',
           'S2WARDCVAO',
           'J00YCU55DP',
           'R07BUIF2F7',
           'VGPS1JD0UM',
           '...'],
 'key_2': ['162LO154PM',
           '3ROAV881V2',
           'I4T79LP18J',
           'WBD36EM6QL',
           'DEIODVQU46',
           '...'],
 'key_3': ['162LO154PM',
           '3ROAV881V2',
           [1, 2, ['a', 'b', 'c'], 3, 4, '...'],
           'KWSJA5WDKQ',
           'WX9SVRFO0G',
           '...']}

Notes:

  • This will create copies of lists. Depending on the size of the data structures, this can be very expensive in terms of memory use.
  • This only deals with the special case of lists. Equivalent behavior would have to be implemented for dicts, tuples, sets, frozensets, ... for this class to be of general use.
Toland answered 22/7, 2016 at 20:8 Comment(1)
Funny, I had a similar prototype. But I disliked the steps you had to undertake to implement the truncation of dictionaries. +1 for the effort!Paeon
P
3

Use reprlib. The formatting is not that pretty, but it actually abbreviates.

> import repr
> repr.repr(map(lambda _: range(100000), range(10)))
'[[0, 1, 2, 3, 4, 5, ...], [0, 1, 2, 3, 4, 5, ...], [0, 1, 2, 3, 4, 5, ...], [0, 1, 2, 3, 4, 5, ...], [0, 1, 2, 3, 4, 5, ...], [0, 1, 2, 3, 4, 5, ...], ...]'
> repr.repr(dict(map(lambda i: (i, range(100000)), range(10))))
'{0: [0, 1, 2, 3, 4, 5, ...], 1: [0, 1, 2, 3, 4, 5, ...], 2: [0, 1, 2, 3, 4, 5, ...], 3: [0, 1, 2, 3, 4, 5, ...], ...}'
Paeon answered 22/7, 2016 at 18:54 Comment(1)
Yep. The "abbreviation" part, or dealing with recursive / self referencing structures is exactly what it was made for. Unfortunately it's not obvious how to combine it with pretty printing (newlines, indentation, ...).Toland
U
2

This recursive function I wrote does something you're asking for.. You can choose the indentation you want too

def pretty(d, indent=0):
    for key in sorted(d.keys()):
        print '\t' * indent + str(key)
        if isinstance(d[key], dict):
            pretty(d[key], indent+1)
        else:
            print '\t' * (indent+1) + str(d[key])

The output of your dictionary is:

key_1
    ['EG8XYD9FVN', 'S2WARDCVAO', 'J00YCU55DP', 'R07BUIF2F7', 'VGPS1JD0UM', 'WL3TWSDP8E', 'LD8QY7DMJ3', 'J36U3Z9KOQ', 'KU2FUGYB2U', 'JF3RQ315BY']
key_2
    ['162LO154PM', '3ROAV881V2', 'I4T79LP18J', 'WBD36EM6QL', 'DEIODVQU46', 'KWSJA5WDKQ', 'WX9SVRFO0G', '6UN63WU64G', '3Z89U7XM60', '167CYON6YN']
Ursi answered 22/7, 2016 at 18:46 Comment(2)
If you change the last line to print '\t' * (indent+1) + str(d[key][:2]) + '...' it would be similar to what I mean. Don't want the entire listOrjonikidze
Oh gotcha, I thought you wrote ... because you didn't wanna write the whole thing out again, my misunderstanding!Ursi

© 2022 - 2024 — McMap. All rights reserved.