What's the best practice for handling single-value tuples in Python?
Asked Answered
U

9

18

I am using a 3rd party library function which reads a set of keywords from a file, and is supposed to return a tuple of values. It does this correctly as long as there are at least two keywords. However, in the case where there is only one keyword, it returns a raw string, not a tuple of size one. This is particularly pernicious because when I try to do something like

for keyword in library.get_keywords():
    # Do something with keyword

, in the case of the single keyword, the for iterates over each character of the string in succession, which throws no exception, at run-time or otherwise, but is nevertheless completely useless to me.

My question is two-fold:

Clearly this is a bug in the library, which is out of my control. How can I best work around it?

Secondly, in general, if I am writing a function that returns a tuple, what is the best practice for ensuring tuples with one element are correctly generated? For example, if I have

def tuple_maker(values):
    my_tuple = (values)
    return my_tuple

for val in tuple_maker("a string"):
    print "Value was", val

for val in tuple_maker(["str1", "str2", "str3"]):
    print "Value was", val

I get

Value was a
Value was  
Value was s
Value was t
Value was r
Value was i
Value was n
Value was g
Value was str1
Value was str2
Value was str3

What is the best way to modify the function my_tuple to actually return a tuple when there is only a single element? Do I explicitly need to check whether the size is 1, and create the tuple seperately, using the (value,) syntax? This implies that any function that has the possibility of returning a single-valued tuple must do this, which seems hacky and repetitive.

Is there some elegant general solution to this problem?

Ungenerous answered 21/1, 2010 at 18:22 Comment(4)
I don't think it's "certainly" a bug. Possibly, but possibly it's intended behaviour (of course, if the docs say it should always return a tuple, it is a bug :) IIRC parts of the re module will return an individual element if there's only one regex match, or a tuple of them if there's more than one.Brewer
It's generally accepted by the Python community as bad practice to allow a bare value instead of a 1-tuple, due to negative experience with places like that and the % operator. I would file a bug.Chinook
It's either a bug or stupid. You choose. ;)Coca
Second question isn't a bug. my_tuple = (values) doesn't produce a tuple, it's just in parantesis. The correct code would be: def tuple_maker(values): my_tuple = (values,) return my_tuple But for that the list case would fail (returning a tuple containing a list)Peculiarize
C
22

You need to somehow test for the type, if it's a string or a tuple. I'd do it like this:

keywords = library.get_keywords()
if not isinstance(keywords, tuple):
    keywords = (keywords,) # Note the comma
for keyword in keywords:
    do_your_thang(keyword)
Coca answered 21/1, 2010 at 18:37 Comment(5)
This works, provided that lists are supposed to be wrapped in one-tuples.Heliotype
@jcd: They are. See the first sentence of the question.Coca
Ok, thanks. I was wary about checking type, but there doesn't seem to be much alternative. I will probably wrap the libary function with a wrapper like this, and call that from my own code. And raise a bug with the library maintainer. ;)Ungenerous
It might be nicer to test specifically for basestring; that way if the library switches to lists or generators, the code will still work. General principle here is to check for the known special case.Preengage
There is one thing here that is slightly misleading and can lead to a bit of confusion. Casting a one-string tuple with (<str>,) works, but using the function tuple(<str>,) does not. See my answer.Pieper
S
8

For your first problem, I'm not really sure if this is the best answer, but I think you need to check yourself whether the returned value is a string or tuple and act accordingly.

As for your second problem, any variable can be turned into a single valued tuple by placing a , next to it:

>>> x='abc'
>>> x
'abc'
>>> tpl=x,
>>> tpl
('abc',)

Putting these two ideas together:

>>> def make_tuple(k):
...     if isinstance(k,tuple):
...             return k
...     else:
...             return k,
... 
>>> make_tuple('xyz')
('xyz',)
>>> make_tuple(('abc','xyz'))
('abc', 'xyz')

Note: IMHO it is generally a bad idea to use isinstance, or any other form of logic that needs to check the type of an object at runtime. But for this problem I don't see any way around it.

Selfdrive answered 21/1, 2010 at 18:33 Comment(0)
B
3

Your tuple_maker doesn't do what you think it does. An equivalent definition of tuple maker to yours is

def tuple_maker(input):
    return input

What you're seeing is that tuple_maker("a string") returns a string, while tuple_maker(["str1","str2","str3"]) returns a list of strings; neither return a tuple!

Tuples in Python are defined by the presence of commas, not brackets. Thus (1,2) is a tuple containing the values 1 and 2, while (1,) is a tuple containing the single value 1.

To convert a value to a tuple, as others have pointed out, use tuple.

>>> tuple([1])
(1,)
>>> tuple([1,2])
(1,2)
Brewer answered 21/1, 2010 at 18:42 Comment(2)
tuple() creates a tuple out of any iterable; not any value. tuple('abc') ==> ('a', 'b', 'c'), tuple(3) ==> error!Unstriped
@me_and: Thanks for the clarification about tuple definition. But Javier highlights the problem I'm having very well (I had already tried using tuple before posting here, and discovered it doesn't help me).Ungenerous
B
3

The () have nothing to do with tuples in python, the tuple syntax uses ,. The ()-s are optional.

E.g.:

>>> a=1, 2, 3
>>> type(a)
<class 'tuple'>
>>> a=1,
>>> type(a)
<class 'tuple'>
>>> a=(1)
>>> type(a)
<class 'int'>

I guess this is the root of the problem.

Boote answered 14/11, 2019 at 19:8 Comment(0)
K
2

There's always monkeypatching!

# Store a reference to the real library function
really_get_keywords = library.get_keywords

# Define out patched version of the function, which uses the real
# version above, adjusting its return value as necessary
def patched_get_keywords():
    """Make sure we always get a tuple of keywords."""
    result = really_get_keywords()
    return result if isinstance(result, tuple) else (result,)

# Install the patched version
library.get_keywords = patched_get_keywords

NOTE: This code might burn down your house and sleep with your wife.

Krimmer answered 21/1, 2010 at 22:3 Comment(1)
@Will McCutchen: +1 - A good point! But I think it will be safer to wrap but not install patched version - I don't know enough about the library's internals to guarantee I won't break something else that depends on this behaviour.Ungenerous
I
1

Rather than checking for a length of 1, I'd use the isinstance built-in instead.

>>> isinstance('a_str', tuple)
False
>>> isinstance(('str1', 'str2', 'str3'), tuple)
True
Ic answered 21/1, 2010 at 18:31 Comment(0)
A
1

Is it absolutely necessary that it returns tuples, or will any iterable do?

import collections
def iterate(keywords):
    if not isinstance(keywords, collections.Iterable):
        yield keywords
    else:
        for keyword in keywords:
            yield keyword


for keyword in iterate(library.get_keywords()):
    print keyword
Antimere answered 21/1, 2010 at 19:12 Comment(1)
@Epcylon: +1 - This is an interesting idea for my own functions, but doesn't help me with my 3rd party library.Ungenerous
H
0

for your first problem you could check if the return value is tuple using

type(r) is tuple
#alternative
isinstance(r, tuple)
# one-liner
def as_tuple(r): return [ tuple([r]), r ][type(r) is tuple]

the second thing i like to use tuple([1]). think it is a matter of taste. could probably also write a wrapper, for example def tuple1(s): return tuple([s])

Hypesthesia answered 21/1, 2010 at 18:29 Comment(0)
P
0

There is an important thing to watch out for when using the tuple() constructor method instead of the default type definition for creating your single-string tuples. Here is a Nose2/Unittest script you can use to play with the problem:

#!/usr/bin/env python
# vim: ts=4 sw=4 sts=4 et
from __future__ import print_function
# global
import unittest
import os
import sys
import logging
import pprint
import shutil

# module-level logger
logger = logging.getLogger(__name__)

# module-global test-specific imports
# where to put test output data for compare.
testdatadir = os.path.join('.', 'test', 'test_data')
rawdata_dir = os.path.join(os.path.expanduser('~'), 'Downloads')
testfiles = (
    'bogus.data',
)
purge_results = False
output_dir = os.path.join('test_data', 'example_out')


def cleanPath(path):
    '''cleanPath
    Recursively removes everything below a path

    :param path:
    the path to clean
    '''
    for root, dirs, files in os.walk(path):
        for fn in files:
            logger.debug('removing {}'.format(fn))
            os.unlink(os.path.join(root, fn))
        for dn in dirs:
            # recursive
            try:
                logger.debug('recursive del {}'.format(dn))
                shutil.rmtree(os.path.join(root, dn))
            except Exception:
                # for now, halt on all.  Override with shutil onerror
                # callback and ignore_errors.
                raise


class TestChangeMe(unittest.TestCase):
    '''
        TestChangeMe
    '''
    testdatadir = None
    rawdata_dir = None
    testfiles   = None
    output_dir  = output_dir

    def __init__(self, *args, **kwargs):
        self.testdatadir = os.path.join(os.path.dirname(
            os.path.abspath(__file__)), testdatadir)
        super(TestChangeMe, self).__init__(*args, **kwargs)
        # check for kwargs
        # this allows test control by instance
        self.testdatadir = kwargs.get('testdatadir', testdatadir)
        self.rawdata_dir = kwargs.get('rawdata_dir', rawdata_dir)
        self.testfiles = kwargs.get('testfiles', testfiles)
        self.output_dir = kwargs.get('output_dir', output_dir)

    def setUp(self):
        '''setUp
        pre-test setup called before each test
        '''
        logging.debug('setUp')
        if not os.path.exists(self.testdatadir):
            os.mkdir(self.testdatadir)
        else:
            self.assertTrue(os.path.isdir(self.testdatadir))
        self.assertTrue(os.path.exists(self.testdatadir))
        cleanPath(self.output_dir)

    def tearDown(self):
        '''tearDown
        post-test cleanup, if required
        '''
        logging.debug('tearDown')
        if purge_results:
            cleanPath(self.output_dir)

    def tupe_as_arg(self, tuple1, tuple2, tuple3, tuple4):
        '''test_something_0
            auto-run tests sorted by ascending alpha
        '''
        # for testing, recreate strings and lens
        string1 = 'string number 1'
        len_s1 = len(string1)
        string2 = 'string number 2'
        len_s2 = len(string2)
        # run the same tests...
        # should test as type = string
        self.assertTrue(type(tuple1) == str)
        self.assertFalse(type(tuple1) == tuple)
        self.assertEqual(len_s1, len_s2, len(tuple1))
        self.assertEqual(len(tuple2), 1)
        # this will fail
        # self.assertEqual(len(tuple4), 1)
        self.assertEqual(len(tuple3), 2)
        self.assertTrue(type(string1) == str)
        self.assertTrue(type(string2) == str)
        self.assertTrue(string1 == tuple1)
        # should test as type == tuple
        self.assertTrue(type(tuple2) == tuple)
        self.assertTrue(type(tuple4) == tuple)
        self.assertFalse(type(tuple1) == type(tuple2))
        self.assertFalse(type(tuple1) == type(tuple4))
        # this will fail
        # self.assertFalse(len(tuple4) == len(tuple1))
        self.assertFalse(len(tuple2) == len(tuple1))

    def default_test(self):
        '''testFileDetection
        Tests all data files for type and compares the results to the current
        stored results.
        '''
        # test 1
        __import__('pudb').set_trace()
        string1 = 'string number 1'
        len_s1 = len(string1)
        string2 = 'string number 2'
        len_s2 = len(string2)
        tuple1 = (string1)
        tuple2 = (string1,)
        tuple3 = (string1, string2)
        tuple4 = tuple(string1,)
        # should test as type = string
        self.assertTrue(type(tuple1) == str)
        self.assertFalse(type(tuple1) == tuple)
        self.assertEqual(len_s1, len_s2, len(tuple1))
        self.assertEqual(len(tuple2), 1)
        # this will fail
        # self.assertEqual(len(tuple4), 1)
        self.assertEqual(len(tuple3), 2)
        self.assertTrue(type(string1) == str)
        self.assertTrue(type(string2) == str)
        self.assertTrue(string1 == tuple1)
        # should test as type == tuple
        self.assertTrue(type(tuple2) == tuple)
        self.assertTrue(type(tuple4) == tuple)
        self.assertFalse(type(tuple1) == type(tuple2))
        self.assertFalse(type(tuple1) == type(tuple4))
        # this will fail
        # self.assertFalse(len(tuple4) == len(tuple1))
        self.assertFalse(len(tuple2) == len(tuple1))
        self.tupe_as_arg(tuple1, tuple2, tuple3, tuple4)
# stand-alone test execution
if __name__ == '__main__':
    import nose2
    nose2.main(
        argv=[
            'fake',
            '--log-capture',
            'TestChangeMe.default_test',
        ])

You will notice that the (nearly) identical code calling tuple(string1,) shows as type tuple, but the length will be the same as the string length and all members will be single characters.

This will cause the assertions on lines #137, #147, #104 and #115 to fail, even though they are seemingly identical to the ones that pass.

(note: I have a PUDB breakpoint in the code at line #124, it's an excellent debug tool, but you can remove it if you prefer. Otherwise simply pip install pudb to use it.)

Pieper answered 4/5, 2018 at 14:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.