how to avoid python numeric literals beginning with "0" being treated as octal?

Asked 16/7, 2012 at 22:22 Answered 5/7, 2017 at 17:3

I am trying to write a small Python 2.x API to support fetching a job by jobNumber, where jobNumber is provided as an integer. Sometimes the users provide ajobNumber as an integer literal beginning with 0, e.g. 037537. (This is because they have been coddled by R, a language that sanely considers 037537==37537.) Python, however, considers integer literals starting with "0" to be OCTAL, thus 037537!=37537, instead 037537==16223. This strikes me as a blatant affront to the principle of least surprise, and thankfully it looks like this was fixed in Python 3---see PEP 3127.

But I'm stuck with Python 2.7 at the moment. So my users do this:

>>> fetchJob(037537)

and silently get the wrong job (16223), or this:

>>> fetchJob(038537)
File "<stdin>", line 1
 fetchJob(038537)
            ^
SyntaxError: invalid token

where Python is rejecting the octal-incompatible digit.

There doesn't seem to be anything provided via __future__ to allow me to get the Py3K behavior---it would have to be built-in to Python in some manner, since it requires a change to the lexer at least.

Is anyone aware of how I could protect my users from getting the wrong job in cases like this? At the moment the best I can think of is to change that API so it take a string instead of an int.

Delfeena answered 16/7, 2012 at 22:22 Comment(6)

Don't be so sure it's fixed in Python 3. From the PEP: "an exception will be raised if a literal has a leading "0" and a second character which is a digit." So instead of getting a surprising number, you'll get a surprising exception. I don't see how that's an improvement. – Wain 16/7, 2012 at 22:28

1) Train the users? 2) Force a string or (and not sure about this), are all job numbers 6 digits? – Scrophulariaceous 16/7, 2012 at 22:30

@MarkRansom-- surely you jest. A surprising exception is very much an improvement over silently doing something very surprising. – Michelmichelangelo 16/7, 2012 at 22:39

Looking at the source (Python 2.7.3), it looks like a small change to Parser/tokenizer.c should fix this issue. I guess it all depends on how "under the hood you want to get" :) – Disadvantage 16/7, 2012 at 23:32

@DSM, OK perhaps an in-your-face surprise is better than a silent surprise. "Hey doctor, it hurts when I do this" - "Well next time you do it we'll amputate your arm". – Wain 17/7, 2012 at 16:2

"Hey Python, it hurts when I do this" - "Well next time you should read the docs instead of assuming". If you truly care about type safety, use a type-secure language, otherwise stay within defined bounds of an unsafe language. – Metamerism 17/5, 2015 at 2:57

At the moment the best I can think of is to change that API so it take a string instead of an int.

Yes, and I think this is a reasonable option given the situation.

Another option would be to make sure that all your job numbers contain at least one digit greater than 7 so that adding the leading zero will give an error immediately instead of an incorrect result, but that seems like a bigger hack than using strings.

A final option could be to educate your users. It will only take five minutes or so to explain not to add the leading zero and what can happen if you do. Even if they forget or accidentally add the zero due to old habits, they are more likely to spot the problem if they have heard of it before.

Erving answered 16/7, 2012 at 22:31 Comment(2)

You could also display the int value in the stdout. Kind of along the lines of educating the user. – Disadvantage 16/7, 2012 at 22:39

I think this is what I'm going with for now. Thanks for all the suggestions. Can't change the Python source code---I was hoping there was a __future__ built-in that would do this for me. Oh well. – Delfeena 17/7, 2012 at 16:51

Perhaps you could take the input as a string, strip leading zeros, then convert back to an int?

test = "001234505"
test = int(test.lstrip("0"))  # 1234505

Bolero answered 5/7, 2017 at 17:3 Comment(1)

This doesn't support literals that are sign-prefixed, i.e. +0012345 – Anaglyph 11/8, 2017 at 16:15

Recommended topics

Hot tags