Why is it string.join(list) instead of list.join(string)?
Asked Answered
T

10

2096

This has always confused me. It seems like this would be nicer:

["Hello", "world"].join("-")

Than this:

"-".join(["Hello", "world"])

Is there a specific reason it is like this?

Tardigrade answered 29/1, 2009 at 22:45 Comment(9)
For easy memory and understanding, - declares that you are joining a list and converting to a string.It's result oriented.Lathi
I think the original idea is that because join() returns a string, it would have to be called from the string context. Putting join() on a list doesn't make a ton of sense in that a list is a container of objects and shouldn't have a one-off function specific to only strings.Fabria
@BallpointBen "...because Python's type system isn't strong enough" is exactly wrong. As Yoshiki Shibukawa's answer (from 8 years before you comment!) says, iterable.join() was considered as possibility but was rejected because it's a less good API design - not because it wasn't possible to implement.Kinematograph
I may be biased because I am used to javascript, but you want to join the list, it should be a method of list imo. It feels backwards.Maegan
I think it's because of the fact that " join is a string method that results in a string" makes more sense?Costplus
Well, str.split() returns a non-string and makes quite a bit of sense. It seems like the same logic should be ok here, right? (Just talking about the conceptual problem of a non-string output)Seely
@Seely 100%. And the strongest argument for sequence.join() is real-world code. This is the flow of data. Something produced the sequence, which we now want to join. Nothing ever produces the separator. It always hard-coded by the programmer.Perjure
Is it the sequence that joins the string (the my_items.join("; ") that's favored by the OP), or is the string that joins the (elements of the) sequence?Bary
I generally say this is one of the two rare situations where Javascript is more sensible than Python (the other being the awkward a = b if c else d). However in light of the brilliant answer https://mcmap.net/q/44993/-why-is-it-string-join-list-instead-of-list-join-string I think I can no longer say that.Elma
L
1439

It's because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.

For example:

'_'.join(['welcome', 'to', 'stack', 'overflow'])
'_'.join(('welcome', 'to', 'stack', 'overflow'))
'welcome_to_stack_overflow'

Using something other than strings will raise the following error:

TypeError: sequence item 0: expected str instance, int found

Lafontaine answered 29/1, 2009 at 22:51 Comment(13)
I do not agree conceptually even if It makes sense codewise. list.join(string) appears more an object-oriented approach whereas string.join(list) sounds much more procedural to me.Dethrone
So why isn't it implemented on iterable?Musaceous
@TimeSheep: A list of integers doesn't have a meaningful join, even though it's iterable.Lafontaine
@recursive, I think an iterable of integers can be understood as string.Friday
@krysopath: It can be, but there are multiple such understandings. Non-list iterables of strings need a way to be joined. And lists of strings are iterables. So it's possible to satisfy all the use cases with this single method. Lists could have a join method, like in javascript, but there are plenty of use cases in python where the existing join method would still be needed. And you can pretty trivially transform the existing one into what you're thinking. e.g. ", ".join(map(str,numbers)).Lafontaine
@recursive, I think the words "joiner" are always strings. capture the essence very nicely. Your point about conventions is just true, though. I figure the reason might be the string handling/bytes encode/decode paradigm python has going on. A string is so very close to a sequence of numbers, thats why I was perhaps stating the obvious :)Friday
I have tried to use print(str.join('-', my_list)) and it works, feels better.Oza
@TimeSheep Because iterable is not a concrete type, iterable is an interface, any type that defines an __iter__ method. Requiring all iterables to also implement join would complicate a general interface (which also covers iterables over non-strings) for a very particular use case. Defining join on strins side-steps this problem at the cost of the "unintuitive" order. A better choice might have been to keep it a function with the first argument being the iterable and the second (optional one) being the joiner string - but that ship has sailed.Enthrall
@Enthrall Maybe this is the exact root cause here, that iterable is an interface instead of an abstract base class (with some non-abstract methods)? In that case, the base class would provide a reasonable, default implementation of iterable.join which implementers could override as needed. I'm not sure, but I suspect this is how it works in Ruby (where [1,2,3].join` is supported)Moussorgsky
@PerLundberg But then every iterable would sport a join method, even those where it makes no sense. For example, you'd have a file.join that has nothing to do with files, generators would have a join method that is entirely unrelated to generators (and very dangerous with infinite ones). I am not familiar with Ruby, but I suspect it simply implements join() as a list method, not as a method on every single iterator.Enthrall
@Enthrall You are right, it is actually implemented on the Array class: ruby-doc.org/core-2.5.1/Array.html#method-i-join Your reasoning might be correct, there's probably a good reason why they didn't choose that route (and also why Ruby didn't choose it either). I think I'm now more in the camp of the "many hard-core Python programmers" mentioned in another answer to this question. :-)Moussorgsky
So it could be implemented both places. Seems like this is Python not heeding its own Zen: “practicality over purity”Perjure
@Dogweather: "There should be one-- and preferably only one --obvious way to do it."Lafontaine
A
450

This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.

  • There were four options proposed in this thread:
    • separator.join(items)
    • items.join(separator)
    • items.reduce(separator)
    • join as a built-in function
  • Guido wanted to support not only lists and tuples, but all sequences/iterables.
  • items.reduce(separator) is difficult for newcomers.
  • items.join(separator) introduces unexpected dependency from sequences to str/unicode.
  • join() as a free-standing built-in function would support only specific data types. So using a built-in namespace is not good. If join() were to support many data types, creating an optimized implementation would be difficult: if implemented using the __add__ method then it would be O(n²).
  • The separator string (separator) should not be omitted. Explicit is better than implicit.

Here are some additional thoughts (my own, and my friend's):

  • Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS-2/-4. To calculate total buffer length for UTF-8 strings, the method needs to know the character encoding.
  • At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).

Guido's decision is recorded in a historical mail, deciding on separator.join(items):

Funny, but it does seem right! Barry, go for it...
--Guido van Rossum

Aposiopesis answered 30/9, 2012 at 15:21 Comment(5)
Nice, this documents the reasoning. It'd be nice to know more about the "unexpected dependency from sequences to str/unicode." -- and whether that is still so.Numerical
This is the best answer, as it provides the authoritative background and reasons it was chosen.Argumentum
@Numerical the join method, if implemented on an iterable, would have to involve converting items to strings, which would introduce a str dependency from the iterable implementation. it will always be so.Maltese
i wonder why they didn't consider string.join(sep, seq) or similar. 🤷‍♂️Maltese
@JasonC You can actually invoke it like that if you want: str.join(",", ["a", "b", "c"]) returns "a,b,c".Cookout
S
80

I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:

  • it must work for different iterables too (tuples, generators, etc.)
  • it must have different behavior between different types of strings.

There are actually two join methods (Python 3.0):

>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>

If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.

Spiccato answered 29/1, 2009 at 23:3 Comment(0)
I
48

Why is it string.join(list) instead of list.join(string)?

This is because join is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?

What if you have a tuple of strings? If this were a list method, you would have to cast every such iterator of strings as a list before you could join the elements into a single string! For example:

some_strings = ('foo', 'bar', 'baz')

Let's roll our own list join method:

class OurList(list): 
    def join(self, s):
        return s.join(self)

And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:

>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'

So we see we have to add an extra step to use our list method, instead of just using the builtin string method:

>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'

Performance Caveat for Generators

The algorithm Python uses to create the final string with str.join actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.

Thus, while passing around generators is usually better than list comprehensions, str.join is an exception:

>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173

Nevertheless, the str.join operation is still semantically a "string" operation, so it still makes sense to have it on the str object than on miscellaneous iterables.

Insidious answered 14/4, 2015 at 0:45 Comment(0)
F
25

Think of it as the natural orthogonal operation to split.

I understand why it is applicable to anything iterable and so can't easily be implemented just on list.

For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.

Frances answered 30/1, 2009 at 2:43 Comment(0)
L
15

- in "-".join(my_list) declares that you are converting to a string from joining elements a list.It's result-oriented. (just for easy memory and understanding)

I made an exhaustive cheatsheet of methods_of_string for your reference.

string_methods_44 = {
    'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
    'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
    'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
    'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
                  'islower','istitle', 'isupper','isprintable', 'isspace', ],
    'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
             'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
    'encode': ['translate', 'maketrans', 'encode'],
    'format': ['format', 'format_map']}
Lathi answered 4/12, 2017 at 12:22 Comment(0)
S
14

Primarily because the result of a someString.join() is a string.

The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.

Spooky answered 29/1, 2009 at 22:51 Comment(0)
T
2

The variables my_list and "-" are both objects. Specifically, they're instances of the classes list and str, respectively. The join function belongs to the class str. Therefore, the syntax "-".join(my_list) is used because the object "-" is taking my_list as an input.

Twiggy answered 15/10, 2019 at 19:38 Comment(0)
C
1

You can't only join lists and tuples. You can join almost any iterable. And iterables include generators, maps, filters etc

>>> '-'.join(chr(x) for x in range(48, 55))
'0-1-2-3-4-5-6'

>>> '-'.join(map(str, (1, 10, 100)))
'1-10-100'

And the beauty of using generators, maps, filters etc is that they cost little memory, and are created almost instantaneously.

Just another reason why it's conceptually:

str.join(<iterator>)

It's efficient only granting str this ability. Instead of granting join to all the iterators: list, tuple, set, dict, generator, map, filter all of which only have object as common parent.

Of course range(), and zip() are also iterators, but they will never return str so they cannot be used with str.join()

>>> '-'.join(range(48, 55))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
Chirp answered 3/3, 2022 at 12:30 Comment(1)
"Instead of granting join to all the iterators: [...], all of which only have object as common parent." -- this seems a sensible reason (to not have iter.join())Numerical
P
-2

I 100% agree with your issue. If we boil down all the answers and comments here, the explanation comes down to "historical reasons".

str.join isn't just confusing or not-nice looking, it's impractical in real-world code. It defeats readable function or method chaining because the separator is rarely (ever?) the result of some previous computation. In my experience, it's always a constant, hard-coded value like ", ".

I clean up my code — enabling reading it in one direction — using tools.functoolz:

>>> from toolz.functoolz import curry, pipe
>>> join = curry(str.join)
>>>
>>> a = ["one", "two", "three"]
>>> pipe(
...     a, 
...     join("; ")
>>> )
'one; two; three'

I'll have several other functions in the pipe as well. The result is that it reads easily in just one direction, from beginning to end as a chain of functions. Currying map helps a lot.

Perjure answered 2/8, 2022 at 11:26 Comment(2)
“I agree” is not an answer to a “why” question.Cookout
@Cookout If you read to the end of the first paragraph, you'll see the actual answer I wrote.Perjure

© 2022 - 2024 — McMap. All rights reserved.