Short (and useful) python snippets [closed]

I

22

48

In spirit of the existing "what's your most useful C/C++ snippet" - thread:

Do you guys have short, monofunctional Python snippets that you use (often) and would like to share with the StackOverlow Community? Please keep the entries small (under 25 lines maybe?) and give only one example per post.

I'll start of with a short snippet i use from time to time to count sloc (source lines of code) in python projects:

# prints recursive count of lines of python source code from current directory
# includes an ignore_list. also prints total sloc

import os
cur_path = os.getcwd()
ignore_set = set(["__init__.py", "count_sourcelines.py"])

loclist = []

for pydir, _, pyfiles in os.walk(cur_path):
    for pyfile in pyfiles:
        if pyfile.endswith(".py") and pyfile not in ignore_set:
            totalpath = os.path.join(pydir, pyfile)
            loclist.append( ( len(open(totalpath, "r").read().splitlines()),
                               totalpath.split(cur_path)[1]) )

for linenumbercount, filename in loclist: 
    print "%05d lines in %s" % (linenumbercount, filename)

print "\nTotal: %s lines (%s)" %(sum([x[0] for x in loclist]), cur_path)

Innkeeper answered 28/3, 2009 at 1:7 Comment(4)

The Python Cookbook (code.activestate.com/recipes/langs/python) is a much better resource for this. Examples, commentary, comments, and available online and in book form. Also, your example is a maintenance horror and "%05d" % ln is better than "%s" % (str(len).zfill(5)). – Underwaist 28/3, 2009 at 2:0

Examples of "horror":1) m.split(curpath)[1] fails if cur_path is "/home/dalke" and m is "/home/dalke/subdir/home/dalke/whatever". 2) the list() isn't needed. 3) 'for b,zn in [(r,f) for ...]' can be reduced to 'for b,ignore,zn in os.walk(cur_path). Oh, and 4) newlines and indentation help readability – Underwaist 28/3, 2009 at 2:8

why not use .endswith() for checking the .py extension? – Consternation 28/3, 2009 at 2:52

also, suggest using a set for the ignore list. this isn't a performance sensitive app, but no reason not to take advantage of hashes for lookups. – Consternation 28/3, 2009 at 2:53

I

23

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]

Iridescent answered 28/3, 2009 at 8:35 Comment(1)

Faster way: stackoverflow.com/questions/2332919 – Canaster 25/2, 2010 at 9:56

H

37

I like using any and a generator:

if any(pred(x.item) for x in sequence):
    ...

instead of code written like this:

found = False
for x in sequence:
    if pred(x.n):
        found = True
if found:
    ...

I first learned of this technique from a Peter Norvig article.

Hispanic answered 29/3, 2009 at 5:57 Comment(2)

+1 For the reference to Norvig's sudoku article. It's very nice. – Coruscation 12/7, 2009 at 13:39

There's also all() to check that all items are True. – Melanimelania 12/7, 2009 at 13:51

I

23

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]

Iridescent answered 28/3, 2009 at 8:35 Comment(1)

Faster way: stackoverflow.com/questions/2332919 – Canaster 25/2, 2010 at 9:56

W

22

The only 'trick' I know that really wowed me when I learned it is enumerate. It allows you to have access to the indexes of the elements within a for loop.

>>> l = ['a','b','c','d','e','f']
>>> for (index,value) in enumerate(l):
...     print index, value
... 
0 a
1 b
2 c
3 d
4 e
5 f

Waldon answered 28/3, 2009 at 2:16 Comment(1)

No need to put index, value in parentheses. Also, the above comment is naive/ignorant. – Melanimelania 12/7, 2009 at 13:50

P

18

zip(*iterable) transposes an iterable.

>>> a=[[1,2,3],[4,5,6]]
>>> zip(*a)
    [(1, 4), (2, 5), (3, 6)]

It's also useful with dicts.

>>> d={"a":1,"b":2,"c":3}
>>> zip(*d.iteritems())
[('a', 'c', 'b'), (1, 3, 2)]

Pung answered 28/3, 2009 at 18:11 Comment(1)

I loved this when I first found it, I think of it as "unzipping", lol. But I didn't know about dictionaries though. Thanks. – Pinnatifid 25/4, 2019 at 16:5

C

16

Fire up a simple web server for files in the current directory:

python -m SimpleHTTPServer

Useful for sharing files.

Cupboard answered 28/3, 2009 at 1:7 Comment(1)

python -m SimpleHTTPServer 8008 to serve on port 8008 – Stateless 2/12, 2016 at 3:31

A

14

A "progress bar" that looks like:

|#############################---------------------|
59 percent done

Code:

class ProgressBar():
    def __init__(self, width=50):
        self.pointer = 0
        self.width = width

    def __call__(self,x):
         # x in percent
         self.pointer = int(self.width*(x/100.0))
         return "|" + "#"*self.pointer + "-"*(self.width-self.pointer)+\
                "|\n %d percent done" % int(x)

Test function (for windows system, change "clear" into "CLS"):

if __name__ == '__main__':
    import time, os
    pb = ProgressBar()
    for i in range(101):
        os.system('clear')
        print pb(i)
        time.sleep(0.1)

Allanallana answered 28/3, 2009 at 1:7 Comment(2)

how do i use this code? – Intertwine 9/6, 2016 at 12:58

now we have tqdm, from tqdm import tqdm & for i in tqdm([1,2,3]): print i – Bauhaus 23/8, 2018 at 12:59

S

11

To flatten a list of lists, such as

[['a', 'b'], ['c'], ['d', 'e', 'f']]

into

['a', 'b', 'c', 'd', 'e', 'f']

use

[inner
    for outer in the_list
        for inner in outer]

Sympathy answered 29/3, 2009 at 1:47 Comment(2)

Or sum(the_list, []). Although I suspect this is going to go very wrong somewhere (aside from generators, of course). – Invitatory 10/3, 2012 at 12:48

@HoverHell, some people may argue with it being perhaps "proper" but I've used this method for a little while and love it. Best! – Pinnatifid 25/4, 2019 at 16:7

T

10

Huge speedup for nested list and dictionaries with:

deepcopy = lambda x: cPickle.loads(cPickle.dumps(x))

Taxicab answered 28/3, 2009 at 21:36 Comment(1)

I've always been leery of this technique, although it seems like it should work about as fast as anything else I could think of. do pythonistas consider this a good way to get deep copies? (fwiw, i use this technique anyway) – Safekeeping 12/7, 2009 at 15:37

I

8

Suppose you have a list of items, and you want a dictionary with these items as the keys. Use fromkeys:

>>> items = ['a', 'b', 'c', 'd']
>>> idict = dict().fromkeys(items, 0)
>>> idict
{'a': 0, 'c': 0, 'b': 0, 'd': 0}
>>>

The second argument of fromkeys is the value to be granted to all the newly created keys.

Iridescent answered 28/3, 2009 at 8:36 Comment(3)

fromkeys is a static method. You should do "dict.fromkeys(items, 0)". Your code creates and throws away an empty dictionary. – Underwaist 28/3, 2009 at 19:39

@Andrew Dalke , I believe dict.fromkeys is a class-method. reason: dict.fromkeys returns a dictionary back, hence it should get class as its first argument. Think about when you've subclassed dict -- MyDict.fromkeys should give an instance of MyDict – Rake 14/3, 2010 at 18:41

@jeffjose: You are correct. I did the test you suggested and looked at the code since I was curious how that was done. – Underwaist 17/3, 2010 at 14:53

I

7

To find out if line is empty (i.e. either size 0 or contains only whitespace), use the string method strip in a condition, as follows:

if not line.strip():    # if line is empty
    continue            # skip it

Iridescent answered 28/3, 2009 at 8:35 Comment(0)

C

5

I like this one to zip everything up in a directory. Hotkey it for instabackups!

import zipfile

z = zipfile.ZipFile('my-archive.zip', 'w', zipfile.ZIP_DEFLATED)
startdir = "/home/johnf"
for dirpath, dirnames, filenames in os.walk(startdir):
  for filename in filenames:
    z.write(os.path.join(dirpath, filename))
z.close()

Cocoon answered 28/3, 2009 at 1:13 Comment(3)

What's wrong with zip -r my-archive.zip directory/ ? – Clymer 28/3, 2009 at 1:14

That's not a Python snippet. :) (Also, you can include special logic in the snippet that might be complicated to do with shell commands.) – Cocoon 28/3, 2009 at 1:18

+1 (althogh I prefer tarfile) – Doxia 4/8, 2010 at 18:17

T

5

For list comprehensions that need current, next:

[fun(curr,next) 
 for curr,next 
 in zip(list,list[1:].append(None)) 
 if condition(curr,next)]

For circular list zip(list,list[1:].append(list[0])).

For previous, current: zip([None].extend(list[:-1]),list) circular: zip([list[-1]].extend(list[:-1]),list)

Taxicab answered 28/3, 2009 at 21:32 Comment(1)

A (slight adjustment of) the pairwise recipe does the same, and works for all iterables: docs.python.org/3.0/library/itertools.html#recipes – Coruscation 12/7, 2009 at 13:51

C

4

Hardlink identical files in current directory (on unix, this means they have share physical storage, meaning much less space):

import os
import hashlib

dupes = {}

for path, dirs, files in os.walk(os.getcwd()):
    for file in files:
        filename = os.path.join(path, file)
        hash = hashlib.sha1(open(filename).read()).hexdigest()
        if hash in dupes:
            print 'linking "%s" -> "%s"' % (dupes[hash], filename)
            os.rename(filename, filename + '.bak')
            try:
                os.link(dupes[hash], filename)
                os.unlink(filename + '.bak')
            except:
                os.rename(filename + '.bak', filename)
            finally:
        else:
            dupes[hash] = filename

Clymer answered 28/3, 2009 at 1:10 Comment(1)

+1, though this code can be improved upon, by using a unique temporary filename, instead of blindly assuming filename.bak doesn't exists. – Coruscation 12/7, 2009 at 13:47

R

3

Here are few which I think are worth knowing but might not be useful on an everyday basis. Most of them are one liners.

Removing Duplicates from a List

L = list(set(L))

Getting Integers from a string (space seperated)

ints = [int(x) for x in S.split()]

Finding Factorial

fac=lambda(n):reduce(int.__mul__,range(1,n+1),1)

Finding greatest common divisor

>>> def gcd(a,b):
...     while(b):a,b=b,a%b
...     return a

Ricercare answered 28/3, 2009 at 1:7 Comment(3)

How can you be sure that set(L) doesn't mess with the order of the original list? sets are orderless – Guss 3/5, 2011 at 12:54

Yes, it probably does, but how WOULD you remove duplicates from a list without messing with the order? That's not a very well-defined question. – Striptease 7/9, 2011 at 6:18

To maintain sequence, you might need to do it in a program (sorry about the formatting, but this is a comment): new_list=[]; for x in old_list: if x not in new_list: new_list.append(x) – Tropicalize 2/9, 2018 at 3:2

B

2

import tempfile
import cPickle

class DiskFifo:
    """A disk based FIFO which can be iterated, appended and extended in an interleaved way"""
    def __init__(self):
        self.fd = tempfile.TemporaryFile()
        self.wpos = 0
        self.rpos = 0
        self.pickler = cPickle.Pickler(self.fd)
        self.unpickler = cPickle.Unpickler(self.fd)
        self.size = 0

    def __len__(self):
        return self.size

    def extend(self, sequence):
        map(self.append, sequence)

    def append(self, x):
        self.fd.seek(self.wpos)
        self.pickler.clear_memo()
        self.pickler.dump(x)
        self.wpos = self.fd.tell()
        self.size = self.size + 1

    def next(self):
        try:
            self.fd.seek(self.rpos)
            x = self.unpickler.load()
            self.rpos = self.fd.tell()
            return x

        except EOFError:
            raise StopIteration

    def __iter__(self):
        self.rpos = 0
        return self

Batory answered 28/3, 2009 at 1:7 Comment(0)

O

2

like another person above, I said 'Wooww !!' when I discovered enumerate()
I sang a praise to Python when I discovered repr() that gave me possibility to see precisely the content of strings that I wanted to analyse with a regex
I was very satisfied to discover that print '\n'.join(list_of_strings) is displayed much more rapidly with '\n'.join(...) than for ch in list_of_strings: print ch
splitlines(1) with an argument keeps the newlines

These four "tricks" combined in one snippet very useful to rapidly display the code source of a web page , line after line, each line being numbered , all the special characters like '\t' or newlines being not interpreted, and with the newlines present:

import urllib
from time import clock,sleep

sock = urllib.urlopen('http://docs.python.org/')
ch = sock.read()
sock.close()


te = clock()
for i,line in enumerate(ch.splitlines(1)):
    print str(i) + ' ' + repr(line)
t1 = clock() - te


print "\n\nIn 3 seconds, I will print the same content, using '\\n'.join(....)\n" 

sleep(3)

te = clock()
# here's the point of interest:
print '\n'.join(str(i) + ' ' + repr(line)
                for i,line in enumerate(ch.splitlines(1)) )
t2 = clock() - te

print '\n'
print 'first  display took',t1,'seconds'
print 'second display took',t2,'seconds'

With my not very fast computer, I got:

first  display took 4.94626048841 seconds
second display took 0.109297410704 seconds

Oeillade answered 28/3, 2009 at 1:7 Comment(0)

G

2

Emulating a switch statement. For example switch(x) {..}:

def a():
  print "a"

def b():
  print "b"

def default():
   print "default"

apply({1:a, 2:b}.get(x, default))

Guss answered 28/3, 2009 at 1:7 Comment(0)

D

1

Iterate over any iterable (list, set, file, stream, strings, whatever), of ANY size (including unknown size), by chunks of x elements:

from itertools import chain, islice

def chunks(iterable, size, format=iter):
    it = iter(iterable)
    while True:
        yield format(chain((it.next(),), islice(it, size - 1)))

>>> l = ["a", "b", "c", "d", "e", "f", "g"]
>>> for chunk in chunks(l, 3, tuple):
...         print chunk
...     
("a", "b", "c")
("d", "e", "f")
("g",)

Diarrhea answered 28/3, 2009 at 1:7 Comment(0)

C

1

I actually just created this, but I think it's going to be a very useful debugging tool.

def dirValues(instance, all=False):
    retVal = {}
    for prop in dir(instance):
        if not all and prop[1] == "_":
            continue
        retVal[prop] = getattr(instance, prop)
    return retVal

I usually use dir() in a pdb context, but I think this will be much more useful:

(pdb) from pprint import pprint as pp
(pdb) from myUtils import dirValues
(pdb) pp(dirValues(someInstance))

Costrel answered 28/3, 2009 at 1:7 Comment(0)

T

1

A custom list that when multiplied by other list returns a cartesian product... the good thing is that the cartesian product is indexable, not like that of itertools.product (but the multiplicands must be sequences, not iterators).

import operator

class mylist(list):
    def __getitem__(self, args):
        if type(args) is tuple:
            return [list.__getitem__(self, i) for i in args]
        else:
            return list.__getitem__(self, args)
    def __mul__(self, args):
        seqattrs = ("__getitem__", "__iter__", "__len__")
        if all(hasattr(args, i) for i in seqattrs):
            return cartesian_product(self, args)
        else:
            return list.__mul__(self, args)
    def __imul__(self, args):
        return __mul__(self, args)
    def __rmul__(self, args):
        return __mul__(args, self)
    def __pow__(self, n):
        return cartesian_product(*((self,)*n))
    def __rpow__(self, n):
        return cartesian_product(*((self,)*n))

class cartesian_product:
    def __init__(self, *args):
        self.elements = args
    def __len__(self):
        return reduce(operator.mul, map(len, self.elements))
    def __getitem__(self, n):
        return [e[i] for e, i  in zip(self.elements,self.get_indices(n))]
    def get_indices(self, n):
        sizes = map(len, self.elements)
        tmp = [0]*len(sizes)
        i = -1
        for w in reversed(sizes):
            tmp[i] = n % w
            n /= w
            i -= 1
        return tmp
    def __add__(self, arg):
        return mylist(map(None, self)+mylist(map(None, arg)))
    def __imul__(self, args):
        return mylist(self)*mylist(args)
    def __rmul__(self, args):
        return mylist(args)*mylist(self)
    def __mul__(self, args):
        if isinstance(args, cartesian_product):
            return cartesian_product(*(self.elements+args.elements))
        else:
            return cartesian_product(*(self.elements+(args,)))
    def __iter__(self):
        for i in xrange(len(self)):
            yield self[i]
    def __str__(self):
        return "[" + ",".join(str(i) for i in self) +"]"
    def __repr__(self):
        return "*".join(map(repr, self.elements))

Thorstein answered 28/3, 2009 at 1:7 Comment(1)

I don't understand a line of it, can you comment on how it works? – Aha 23/3, 2010 at 14:45

T

1

For Python 2.4+ or earlier:

for x,y in someIterator:
  listDict.setdefault(x,[]).append(y)

In Python 2.5+ there is alternative using defaultdict.

Taxicab answered 29/3, 2009 at 10:46 Comment(0)

P

0

When debugging, you sometimes want to see a string with a basic editor. For showing a string with notepad:

import os, tempfile, subprocess

def get_rand_filename(dir_=os.getcwd()):
    "Function returns a non-existent random filename."
    return tempfile.mkstemp('.tmp', '', dir_)[1]

def open_with_notepad(s):
    "Function gets a string and shows it on notepad"
    with open(get_rand_filename(), 'w') as f:
        f.write(s)
        subprocess.Popen(['notepad', f.name])

Pasteup answered 28/3, 2009 at 1:7 Comment(1)

teachyourselfpython.com has a very large and growing library of code snippets – Peruse 22/7, 2017 at 12:32

Recommended topics

Hot tags