Why does Python have a format function as well as a format method
Asked Answered
S

3

41

The format function in builtins seems to be like a subset of the str.format method used specifically for the case of a formatting a single object.

eg.

>>> format(13, 'x')
'd'

is apparently preferred over

>>> '{0:x}'.format(13)
'd'

and IMO it does look nicer, but why not just use str.format in every case to make things simpler? Both of these were introduced in 2.6 so there must be a good reason for having both at once, what is it?

Edit: I was asking about str.format and format, not why we don't have a (13).format

Stricklin answered 22/5, 2013 at 4:33 Comment(7)
Er, this is the first time I've ever heard someone say that format() is preferred over .format() - even the documentation for format string specifications uses .format() throughout. Where are you getting this "format() is preferred" from?Goa
@Goa just from answers here on SO which always seem to use it in that caseStricklin
@Goa #16415059Stricklin
that seems to be a rather flimsy example to infer what the "preferred" style is - especially given the second answer to that question and the discussion in the comments. As another counterexample, see https://mcmap.net/q/167338/-inserting-the-same-value-multiple-times-when-formatting-a-string/…Goa
@Goa your linked question uses multiple substitutions though, where only .format worksStricklin
That example is your own answer ... just sayin' :DJalap
@Jalap I thought it might be but I wanted to see if there was any definitive answer since most people don't even know about formatStricklin
F
11

I think format and str.format do different things. Even though you could use str.format for both, it makes sense to have separate versions.

The top level format function is part of the new "formatting protocol" that all objects support. It simply calls the __format__ method of the object it is passed, and returns a string. This is a low-level task, and Python's style is to usually have builtin functions for those. Paulo Scardine's answer explains some of the rationale for this, but I don't think it really addresses the differences between what format and str.format do.

The str.format method is a bit more high-level, and also a bit more complex. It can not only format multiple objects into a single result, but it can also reorder, repeat, index, and do various other transformations on the objects. Don't just think of "{}".format(obj). str.format is really designed for more about complicated tasks, like these:

"{1} {0} {1!r}".format(obj0, obj1) # reorders, repeats, and and calls repr on obj1
"{0.value:.{0.precision}f}".format(obj) # uses attrs of obj for value and format spec
"{obj[name]}".format(obj=my_dict) # takes argument by keyword, and does an item lookup

For the low-level formatting of each item, str.format relies on the same machinery of the format protocol, so it can focus its own efforts on the higher level stuff. I doubt it actually calls the builtin format, rather than its arguments' __format__ methods, but that's an implementation detail.

While ("{:"+format_spec+"}").format(obj) is guaranteed to give the same results as format(obj, format_spec), I suspect the latter will be a bit faster, since it doesn't need to parse the format string to check for any of the complicated stuff. However the overhead may be lost in the noise in a real program.

When it comes to usage (including examples on Stack Overflow), you may see more str.format use simply because some programmers do not know about format, which is both new and fairly obscure. In contrast, it's hard to avoid str.format (unless you have decided to stick with the % operator for all of your formatting). So, the ease (for you and your fellow programmers) of understanding a str.format call may outweigh any performance considerations.

Fabien answered 22/5, 2013 at 22:37 Comment(1)
Paulo put a lot of effort into his answer and it looks like one of those all encompassing guides. However he is answering a non existent question. I wanted to know why we don't just always use '{0}'.format. Yes I understand how format() is syntactic sugar for __format__ and Paulo went into a lot of deal about why this is good for Python. But it's just not my question. Your answer explains why and it makes logical sense. I have to disagree with Python having both versions because it goes against having "one way to do it" mantra but oh well. I'm gonna keep it simple with '{0}'.format only.Stricklin
F
43

tldr; format just calls obj.__format__ and is used by the str.format method which does even more higher level stuff. For the lower level it makes sense to teach an object how to format itself.

It is just syntactic sugar

The fact that this function shares the name and format specification with str.format can be misleading. The existence of str.format is easy to explain: it does complex string interpolation (replacing the old % operator); format can format a single object as string, the smallest subset of str.format specification. So, why do we need format?

The format function is an alternative to the obj.format('fmt') construct found in some OO languages. This decision is consistent with the rationale for len (on why Python uses a function len(x) instead of a property x.length like Javascript or Ruby).

When a language adopts the obj.format('fmt') construct (or obj.length, obj.toString and so on), classes are prevented from having an attribute called format (or length, toString, you got the idea) - otherwise it would shadow the standard method from the language. In this case, the language designers are placing the burden of preventing name clashes on the programmer.

Python is very fond of the PoLA and adopted the __dunder__ (double underscores) convention for built-ins in order to minimize the chance of conflicts between user-defined attributes and the language built-ins. So obj.format('fmt') becomes obj.__format__('fmt'), and of course you can call obj.__format__('fmt') instead of format(obj, 'fmt') (the same way you can call obj.__len__() instead of len(obj)).

Using your example:

>>> '{0:x}'.format(13)
'd'
>>> (13).__format__('x')
'd'
>>> format(13, 'x')
'd'

Which one is cleaner and easier to type? Python design is very pragmatic, it is not only cleaner but is well aligned with the Python's duck-typed approach to OO and gives the language designers freedom to change/extend the underlying implementation without breaking legacy code.

The PEP 3101 introduced the new str.format method and format built-in without any comment on the rationale for the format function, but the implementation is obviously just syntactic sugar:

def format(value, format_spec):
    return value.__format__(format_spec)

And here I rest my case.

What Guido said about it (or is it official?)

Quoting the very BDFL about len:

First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:

(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.

(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.

Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/

source: [email protected] (original post here has also the original question Guido was answering). Abarnert suggests also:

There's additional reasoning about len in the Design and History FAQ. Although it's not as complete or as good of an answer, it is indisputably official. – abarnert

Is this a practical concern or just syntax nitpicking?

This is a very practical and real-world concern in languages like Python, Ruby or Javascript because in dynamically typed languages any mutable object is effectively a namespace, and the concept of private methods or attributes is a matter of convention. Possibly I could not put it better than abarnert in his comment:

Also, as far as the namespace-pollution issue with Ruby and JS, it's worth pointing out that this is an inherent problem with dynamically-typed languages. In statically-typed languages as diverse as Haskell and C++, type-specific free functions are not only possible, but idiomatic. (See The Interface Principle.) But in dynamically-typed languages like Ruby, JS, and Python, free functions must be universal. A big part of language/library design for dynamic languages is picking the right set of such functions.

For example, I just left Ember.js in favor of Angular.js because I was tired of namespace conflicts in Ember; Angular handles this using an elegant Python-like strategy of prefixing built-in methods (with $thing in Angular, instead of underscores like python), so they do not conflict with user-defined methods and properties. Yes, the whole __thing__ is not particularly pretty but I'm glad Python took this approach because it is very explicit and avoid the PoLA class of bugs regarding object namespace clashes.

Forbear answered 22/5, 2013 at 4:33 Comment(13)
I wouldn't say these two cases are exactly the same ('{0}'.format is not that same as x.len, It's similar to ''.join in a way) but I see where you are coming from and this makes sense.Stricklin
@jamylak: in some OO languages, every object is supposed to have format method, like obj.format('fmt'). In Python instead, the form format(obj, 'fmt') was preferred. So this function is not an special case of str.format, despite sharing the name and format specification.Forbear
I know but '{0}'.format also calls .__format__ of the object just like format() doesStricklin
the same way len(x) may call x.__len__, but it is an implementation detail that is supposed to remain hidden from the user interface.Forbear
I tried to word the answer better, if you are not satisfied perhaps I misunderstood your question.Forbear
your answer makes sense in that I should not have to ever call .__format__ so there should be a corresponding builtin function to do that, even though '{0}'.format also does that.Stricklin
Perhaps our BDFL shall grace us with an answer; but I don't see a better explanation than this one.Eiser
@PauloScardine do you have a source for your quote from BDFL?Thorr
@poorsod: pyfaq - (A Semi-Official) Python FAQ ZoneForbear
There's additional reasoning about len in the Design and History FAQ. Although it's not as complete or as good of an answer, it is indisputably official.Tamper
Also, as far as the namespace-pollution issue with Ruby and JS, it's worth pointing out that this is an inherent problem with dynamically-typed languages. In statically-typed languages as diverse as Haskell and C++, type-specific free functions are not only possible, but idiomatic. (See The Interace Principle.) But in dynamically-typed languages like Ruby, JS, and Python, free functions must be universal. A big part of language/library design for dynamica languages is picking the right set of such functions.Tamper
@abarnert: quality comment, included in the answer.Forbear
@jamylak: thanks for the great question, this is my first answer with 20 upvotes.Forbear
F
11

I think format and str.format do different things. Even though you could use str.format for both, it makes sense to have separate versions.

The top level format function is part of the new "formatting protocol" that all objects support. It simply calls the __format__ method of the object it is passed, and returns a string. This is a low-level task, and Python's style is to usually have builtin functions for those. Paulo Scardine's answer explains some of the rationale for this, but I don't think it really addresses the differences between what format and str.format do.

The str.format method is a bit more high-level, and also a bit more complex. It can not only format multiple objects into a single result, but it can also reorder, repeat, index, and do various other transformations on the objects. Don't just think of "{}".format(obj). str.format is really designed for more about complicated tasks, like these:

"{1} {0} {1!r}".format(obj0, obj1) # reorders, repeats, and and calls repr on obj1
"{0.value:.{0.precision}f}".format(obj) # uses attrs of obj for value and format spec
"{obj[name]}".format(obj=my_dict) # takes argument by keyword, and does an item lookup

For the low-level formatting of each item, str.format relies on the same machinery of the format protocol, so it can focus its own efforts on the higher level stuff. I doubt it actually calls the builtin format, rather than its arguments' __format__ methods, but that's an implementation detail.

While ("{:"+format_spec+"}").format(obj) is guaranteed to give the same results as format(obj, format_spec), I suspect the latter will be a bit faster, since it doesn't need to parse the format string to check for any of the complicated stuff. However the overhead may be lost in the noise in a real program.

When it comes to usage (including examples on Stack Overflow), you may see more str.format use simply because some programmers do not know about format, which is both new and fairly obscure. In contrast, it's hard to avoid str.format (unless you have decided to stick with the % operator for all of your formatting). So, the ease (for you and your fellow programmers) of understanding a str.format call may outweigh any performance considerations.

Fabien answered 22/5, 2013 at 22:37 Comment(1)
Paulo put a lot of effort into his answer and it looks like one of those all encompassing guides. However he is answering a non existent question. I wanted to know why we don't just always use '{0}'.format. Yes I understand how format() is syntactic sugar for __format__ and Paulo went into a lot of deal about why this is good for Python. But it's just not my question. Your answer explains why and it makes logical sense. I have to disagree with Python having both versions because it goes against having "one way to do it" mantra but oh well. I'm gonna keep it simple with '{0}'.format only.Stricklin
S
0

Here's an analogy, perhaps a bit silly: Why is there both a + operator and the sum() function? After all, if you want to add up two numbers (or whatever), you can always use sum() with on a 2-tuple:

print(sum((3, 4)))

Under the hood, sum delegates to +, which delegates to __add__() or __radd__(), but sum could delegate to __add__() or __radd__() directly and then we could do away with +.

The answer in both cases is that the more basic version is simpler, both in the sense of simpler to read and has a simpler code implementation. 3 + 4 is simpler than sum((3, 4)), and format(3, "x") is simpler than "{:x}".format(3). In a sense, this argument is actually stronger for format() than sum() because sum() can only call +, whereas "...".format() can do more than just call format() (it can do string interpolation and rearrange argument positions).

So I disagree with your comment that the format function contradicts Python's philosophy of only having one way to do things. If you all you want is to format a single object, and don't need to interpolate it into a larger string, then the format(...) function is the one best way to do it.

Sudan answered 15/2 at 11:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.