Why do powers of 10 print in scientific notation at the 5th power?
Asked Answered
N

2

41

I would like to know if and how the powers of 10 are related to the printing of scientific notation in the console. I've searched R docs and haven't found anything relevant, or that I really understand.

First off, my scipen and digits settings are

unlist(options("scipen", "digits"))
# scipen digits 
#      0      7 

Now, powers of 10 are printed normally up to the 4th power, and then printing switches to scientific notation at the 5th power.

10^(1:4)
# [1]    10   100  1000 10000
10^(1:5)
# [1] 1e+01 1e+02 1e+03 1e+04 1e+05

Interestingly, this does not happen for some other numbers larger than 10.

11^(1:5)
# [1]     11    121   1331  14641 161051

Judging from the following, 5 digits seem significant.

100^(1:2)
# [1]   100 10000
100^(1:3)
# [1] 1e+02 1e+04 1e+06

So my questions then are:

Why is scientific notation activated between the 4th and 5th power for 10 and not for other numbers? Is the number 5 significant? Furthermore, why 5 and not a number closer to the maximum digits option of 22?

Nerissa answered 16/9, 2014 at 2:2 Comment(0)
G
42

Well, the answer is actually there in the definition of scipen in ?options, although it's pretty hard to understand what it means without playing around with some examples:

‘scipen’: integer. A penalty to be applied when deciding to print numeric values in fixed or exponential notation. Positive values bias towards fixed and negative towards scientific notation: fixed notation will be preferred unless it is more than ‘scipen’ digits wider.

To see what that means, examine the following three pairs of exactly identical numbers. In the first two cases, the width in characters of the fixed notation that is less than or equal to the width of the scientific, so fixed notation is preferred.

In the third case, though, the fixed notation is wider (i.e. "more than 0 digits wider"), because the 5 zeros amount to more characters than the 4 characters used to represent the same value using e+nn. As a result, in that case scientific notation is preferred.

1e+03
1000
# [1] 1000

1e+04
10000
# [1] 10000

1e+05
100000      ## <- wider
# [1] 1e+05

Next, examine some numbers that also end with lots of zeros, but whose representation in scientific notation will require use of a .. For these numbers, scientific notation will be used once you have 6 or more zeros (i.e. more than the 5 characters taken up by one . and the characters e+nn).

1.1e+06
1100000
# [1] 1100000


1.1e+07
11000000     ##  <- wider
# [1] 1.1e+07

Reasoning about the tradeoff gets a bit trickier for most other numbers, for which the values of both options("scipen") and options("digits") come into play, but the general idea is exactly the same.

To see some of the slightly surprising complications that come into play, you might want to paste the following into your console (perhaps after first trying to predict where within each series the switch to scientific notation will occur).

100001
1000001
10000001
100000001
1000000001
10000000001
100000000001
1000000000001

111111
1111111
11111111
111111111
1111111111
11111111111
111111111111
1111111111111
Ghassan answered 16/9, 2014 at 4:14 Comment(0)
M
8

I'm confused as to what exactly is your question; or, more specially, how you would use an answer to this question to somehow change/control the behavior of R. You you trying to format numbers a certain way? There are better ways to do that.

When you type values like that, the results are implicitly run though one of the print() commands to be formatted "nicely" to the console. Whenever things have to look "nice" on screen, the code to do that is often ugly. Here most of the that code is taken care of by the formatReal function, and the helper scientific function. The latter tracks the following information for a number

/* for a number x , determine
 *  sgn    = 1_{x < 0}  {0/1}
 *  kpower = Exponent of 10;
 *  nsig   = min(R_print.digits, #{significant digits of alpha})
 *  roundingwidens = 1 if rounding causes x to increase in width, 0 otherwise
 *
 * where  |x| = alpha * 10^kpower   and  1 <= alpha < 10
 */

Then the former function uses this information to try to make "nice" looking numbers by balancing values to the left and the right of the decimal place. It's a combination of many things like the order of magnitude of the number and the number of significant digits as well as environmental influences form the scipen option, etc.

print() is only meant to make things look "nice." What exactly is nice depends on all the values in a vector. You'll find few hard cutoffs in that code; it's very adaptive. There is no easy way to concisely describe everything it does in the general case (which is what it sounds like you are asking for).

The only thing that is certain is that if you need to have your numbers formatted in a certain way, use a function like sprintf() or formatC() that allows for precise control.

Of course this behavior is dependent on class() and i've pointed the the formatReal stuff since that's where most tricky things happen. But observe the difference when you use integers

c(10, 100, 1000, 10000, 100000)
# [1] 1e+01 1e+02 1e+03 1e+04 1e+05
c(10L, 100L, 1000L, 10000L, 100000L)
# [1]     10    100   1000  10000 100000
Mortarboard answered 16/9, 2014 at 4:7 Comment(2)
To clarify, I'm not trying to change the behavior. Rather, I'm wondering why this seems to happen only for e.g. 10^(1:5) and not for 11^(1:5). Multiples of 10 seem significant, and so do 5 digits when the number is a multiple of 10. But it does make sense that R is "prettying up" the numbers.Nerissa
The difference in 10^(1:5) and 11^(1:5) is mostly due to the difference in the number of significant digits. When you're doing powers of 10, you loose no information when you jump to scientific notation. But when you use 11 as a base, you'd potentially be dropping digits.Mortarboard

© 2022 - 2024 — McMap. All rights reserved.