Correct way to detect sequence parameter?
Asked Answered
W

12

18

I want to write a function that accepts a parameter which can be either a sequence or a single value. The type of value is str, int, etc., but I don't want it to be restricted to a hardcoded list. In other words, I want to know if the parameter X is a sequence or something I have to convert to a sequence to avoid special-casing later. I could do

type(X) in (list, tuple)

but there may be other sequence types I'm not aware of, and no common base class.

-N.

Edit: See my "answer" below for why most of these answers don't help me. Maybe you have something better to suggest.

Worried answered 20/11, 2008 at 13:52 Comment(4)
Please note that an object of type str is also a sequence-type!Furfural
@pi: correct, and that's the real problem here. All "good" answers below don't take that into account.Worried
Hmm. I'd actually say that it was that you didn't really define the problem. If you had said what the task was initially, you might have gotten more useful answers right away.Agogue
The fussy part is that OP doesn't mean "sequence" in the Python sense (where sequence has a well defined meaning), but in a looser sense of "multiple things" vs. "one thing".Granddaughter
G
5

The problem with all of the above mentioned ways is that str is considered a sequence (it's iterable, has getitem, etc.) yet it's usually treated as a single item.

For example, a function may accept an argument that can either be a filename or a list of filenames. What's the most Pythonic way for the function to detect the first from the latter?

Based on the revised question, it sounds like what you want is something more like:

def to_sequence(arg):
    ''' 
    determine whether an arg should be treated as a "unit" or a "sequence"
    if it's a unit, return a 1-tuple with the arg
    '''
    def _multiple(x):  
        return hasattr(x,"__iter__")
    if _multiple(arg):  
        return arg
    else:
        return (arg,)

>>> to_sequence("a string")
('a string',)
>>> to_sequence( (1,2,3) )
(1, 2, 3)
>>> to_sequence( xrange(5) )
xrange(5)

This isn't guaranteed to handle all types, but it handles the cases you mention quite well, and should do the right thing for most of the built-in types.

When using it, make sure whatever receives the output of this can handle iterables.

Granddaughter answered 8/1, 2009 at 19:35 Comment(1)
As @Jim Brissom pointed out in a comment about this answer to a related question, strings not having an __iter__ attribute was an discrepancy that was fixed in Python 3 which makes this approach forwards-incompatible.Yordan
T
19

As of 2.6, use abstract base classes.

>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance(0, collections.Sequence)
False

Furthermore ABC's can be customized to account for exceptions, such as not considering strings to be sequences. Here an example:

import abc
import collections

class Atomic(object):
    __metaclass__ = abc.ABCMeta
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

Atomic.register(basestring)

After registration the Atomic class can be used with isinstance and issubclass:

assert isinstance("hello", Atomic) == True

This is still much better than a hard-coded list, because you only need to register the exceptions to the rule, and external users of the code can register their own.

Note that in Python 3 the syntax for specifying metaclasses changed and the basestring abstract superclass was removed, which requires something like the following to be used instead:

class Atomic(metaclass=abc.ABCMeta):
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

Atomic.register(str)

If desired, it's possible to write code which is compatible both both Python 2.6+ and 3.x, but doing so requires using a slightly more complicated technique which dynamically creates the needed abstract base class, thereby avoiding syntax errors due to the metaclass syntax difference. This is essentially the same as what Benjamin Peterson's six module'swith_metaclass()function does.

class _AtomicBase(object):
    @classmethod
    def __subclasshook__(cls, other):
        return not issubclass(other, collections.Sequence) or NotImplemented

class Atomic(abc.ABCMeta("NewMeta", (_AtomicBase,), {})):
    pass

try:
    unicode = unicode
except NameError:  # 'unicode' is undefined, assume Python >= 3
    Atomic.register(str)  # str includes unicode in Py3, make both Atomic
    Atomic.register(bytes)  # bytes will also be considered Atomic (optional)
else:
    # basestring is the abstract superclass of both str and unicode types
    Atomic.register(basestring)  # make both types of strings Atomic

In versions before 2.6, there are type checkers in theoperatormodule.

>>> import operator
>>> operator.isSequenceType([])
True
>>> operator.isSequenceType(0)
False
Throttle answered 20/11, 2008 at 17:52 Comment(1)
>>> assert operator.isSequenceType("hello") == True (as pointed out elsewhere, strings are sequeneces, and Coady I'm sure well knows.... original question was underspec'd.Granddaughter
G
5

The problem with all of the above mentioned ways is that str is considered a sequence (it's iterable, has getitem, etc.) yet it's usually treated as a single item.

For example, a function may accept an argument that can either be a filename or a list of filenames. What's the most Pythonic way for the function to detect the first from the latter?

Based on the revised question, it sounds like what you want is something more like:

def to_sequence(arg):
    ''' 
    determine whether an arg should be treated as a "unit" or a "sequence"
    if it's a unit, return a 1-tuple with the arg
    '''
    def _multiple(x):  
        return hasattr(x,"__iter__")
    if _multiple(arg):  
        return arg
    else:
        return (arg,)

>>> to_sequence("a string")
('a string',)
>>> to_sequence( (1,2,3) )
(1, 2, 3)
>>> to_sequence( xrange(5) )
xrange(5)

This isn't guaranteed to handle all types, but it handles the cases you mention quite well, and should do the right thing for most of the built-in types.

When using it, make sure whatever receives the output of this can handle iterables.

Granddaughter answered 8/1, 2009 at 19:35 Comment(1)
As @Jim Brissom pointed out in a comment about this answer to a related question, strings not having an __iter__ attribute was an discrepancy that was fixed in Python 3 which makes this approach forwards-incompatible.Yordan
B
4

Sequences are described here: https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange

So sequences are not the same as iterable objects. I think sequence must implement __getitem__, whereas iterable objects must implement __iter__. So for example string are sequences and don't implement __iter__, xrange objects are sequences and don't implement __getslice__.

But from what you seen to want to do, I'm not sure you want sequences, but rather iterable objects. So go for hasattr("__getitem__", X) you want sequences, but go rather hasattr("__iter__", X) if you don't want strings for example.

Beckford answered 20/11, 2008 at 15:12 Comment(2)
hasattr is not sufficient to check that it is iterable. The list type will also have an iter member, but isn't itself iterable. ie. hasattr(list,'iter') == True, but iter(list) -> "TypeError: 'type' object is not iterable". It'll also miss objects implementing getitem but not iterAugust
Ok for list particular case: it is a builtin and iter is a static method (1 arg). So basically, yes, duck typing fails! But iter() creates an iterable object from a non-iterable one (so not the solution). The only 'sure' solution is to exclude/include desired types as you and noamtm did.Beckford
L
4

IMHO, the python way is to pass the list as *list. As in:

myfunc(item)
myfunc(*items)
Lenzi answered 1/12, 2008 at 18:59 Comment(1)
Actually myfunc(*items) passes a tuple of the items list.Yordan
A
3

In cases like this, I prefer to just always take the sequence type or always take the scalar. Strings won't be the only types that would behave poorly in this setup; rather, any type that has an aggregate use and allows iteration over its parts might misbehave.

Acrylic answered 20/11, 2008 at 14:52 Comment(0)
A
3

The simplest method would be to check if you can turn it into an iterator. ie

try:
    it = iter(X)
    # Iterable
except TypeError:
    # Not iterable

If you need to ensure that it's a restartable or random access sequence (ie not a generator etc), this approach won't be sufficient however.

As others have noted, strings are also iterable, so if you need so exclude them (particularly important if recursing through items, as list(iter('a')) gives ['a'] again, then you may need to specifically exclude them with:

 if not isinstance(X, basestring)
August answered 20/11, 2008 at 15:39 Comment(2)
Does not help me, because an str object is iterable, but I want to treat it as a single item.Worried
That's why I mentioned checking isinstance. Since strings are fully iterable and indexable, theres no way to exclude them based on their "sequenceness" that won't also exclude some other sequence type. Your only real option is to add an explicit isinstance check to treat them as an exception.August
W
2

I'm new here so I don't know what's the correct way to do it. I want to answer my answers:

The problem with all of the above mentioned ways is that str is considered a sequence (it's iterable, has __getitem__, etc.) yet it's usually treated as a single item.

For example, a function may accept an argument that can either be a filename or a list of filenames. What's the most Pythonic way for the function to detect the first from the latter?

Should I post this as a new question? Edit the original one?

Worried answered 23/11, 2008 at 10:32 Comment(1)
I the Coady's answer is the best because it allows one to pick and choose which classes are considered to be sequences or not, regardless of what special methods they may or may not have.Yordan
K
1

I think what I would do is check whether the object has certain methods that indicate it is a sequence. I'm not sure if there is an official definition of what makes a sequence. The best I can think of is, it must support slicing. So you could say:

is_sequence = '__getslice__' in dir(X)

You might also check for the particular functionality you're going to be using.

As pi pointed out in the comment, one issue is that a string is a sequence, but you probably don't want to treat it as one. You could add an explicit test that the type is not str.

Kendy answered 20/11, 2008 at 14:5 Comment(2)
hasattr(X, 'getslice') is a more efficient way of doing the same thingValance
getslice is actually obsolete - slices in user types are better handled by accepting slice objects in the getitem method, so things can be slicable without having getslice. I think that you also shouldn't need to be slicable to be considered a sequence.August
S
1

If strings are the problem, detect a sequence and filter out the special case of strings:

def is_iterable(x):
  if type(x) == str:
    return False
  try:
    iter(x)
    return True
  except TypeError:
    return False
Sebaceous answered 4/12, 2008 at 14:43 Comment(0)
A
0

Revised answer:

I don't know if your idea of "sequence" matches what the Python manuals call a "Sequence Type", but in case it does, you should look for the __Contains__ method. That is the method Python uses to implement the check "if something in object:"

if hasattr(X, '__contains__'):
    print "X is a sequence"

My original answer:

I would check if the object that you received implements an iterator interface:

if hasattr(X, '__iter__'):
    print "X is a sequence"

For me, that's the closest match to your definition of sequence since that would allow you to do something like:

for each in X:
    print each
Agribusiness answered 20/11, 2008 at 14:18 Comment(10)
File objects have an iterator interface. There may be other objects that have it as well that aren't true sequences.Kendy
@Dave: you are right about file objects having the iterator interface. I'll add a new revised answerAgribusiness
I'll reiterate a comment I made on a nother post - you should use hasattr(X, 'contains') instead of 'contains' in dir(X)Valance
@Moe: It's reasonable to expect that dir() will never return a very long list, so the performance difference between "in" and "hasattr" would be insignificant. And I think that "in" is clearer and shows better the intention of the check, although of course that's subjective.Agribusiness
@Ricardo: I agree with Moe. hasattr is both clearer and faster.Willette
Both are actually wrong here (since hasattr(list,'contains') is true too), but IMHO hasattr is better too. dir() is a convenience for interactive use rather than designed for an getting exhaustive lists of methods. See docs.python.org/library/functions.html#dirAugust
(from above linked document): Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases.August
@Brian: you are right. I never knew about dir's limitations, and I always considered the choice between dir and hasattr a matter of choice. I'll edit my response accordingly. Thanks for the info.Agribusiness
One minor correction: It's hasattr(X, methodname), rather than hasattr being a method on the object.August
__contains__ is for container types, sequence types must implement __len__ and __getitem__, see here: docs.python.org/3/library/collections.abc.htmlGathering
L
0

You're asking the wrong question. You don't try to detect types in Python; you detect behavior.

  1. Write another function that handles a single value. (let's call it _use_single_val).
  2. Write one function that handles a sequence parameter. (let's call it _use_sequence).
  3. Write a third parent function that calls the two above. (call it use_seq_or_val). Surround each call with an exception handler to catch an invalid parameter (i.e. not single value or sequence).
  4. Write unit tests to pass correct & incorrect parameters to the parent function to make sure it catches the exceptions properly.

    def _use_single_val(v):
        print v + 1  # this will fail if v is not a value type

    def _use_sequence(s):
        print s[0]   # this will fail if s is not indexable

    def use_seq_or_val(item):    
        try:
            _use_single_val(item)
        except TypeError:
            pass

        try:
            _use_sequence(item)
        except TypeError:
            pass

        raise TypeError, "item not a single value or sequence"

EDIT: Revised to handle the "sequence or single value" asked about in the question.

Lodie answered 20/11, 2008 at 16:46 Comment(3)
I don't know why this one was voted down - I think it is the best hint here. The workflow with a dynamically typed language shouldn't be 'what type is this object', but 'what can this object do?' It's the same as testing browsers for which browser they are, rather than how they behave in contexts.Agogue
Isn't that what is being asked? He has a method based on detecting type (type(X) in (list, tuple)), but wants one that will detect based on behaviour (acts like a sequence). Neither are "incorrect" parameters for his purposes, he just wants them to do different things.August
-1, I agree with detecting behavior, but the question is about overloading based on sequence/non-sequenceSebaceous
P
-1

You could pass your parameter in the built-in len() function and check whether this causes an error. As others said, the string type requires special handling.

According to the documentation the len function can accept a sequence (string, list, tuple) or a dictionary.

You could check that an object is a string with the following code:

x.__class__ == "".__class__
Pemphigus answered 20/11, 2008 at 14:8 Comment(1)
Better way to test for string is isinstance(x, basestr).Willette

© 2022 - 2024 — McMap. All rights reserved.