Parsing scientific notation sensibly?
Asked Answered
S

4

47

I want to be able to write a function which receives a number in scientific notation as a string and splits out of it the coefficient and the exponent as separate items. I could just use a regular expression, but the incoming number may not be normalised and I'd prefer to be able to normalise and then break the parts out.

A colleague has got part way of an solution using VB6 but it's not quite there, as the transcript below shows.

cliVe> a = 1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 10 exponent: 5 

should have been 1 and 6

cliVe> a = 1.1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.1 exponent: 6

correct

cliVe> a = 123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

correct

cliVe> a = -123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

should be -1.233456 and -2

cliVe> a = -123345.6e+7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: 12

correct

Any ideas? By the way, Clive is a CLI based on VBScript and can be found on my weblog.

Stratiform answered 12/3, 2009 at 13:13 Comment(3)
It would be more helpful to have a list of valid input => output, than the output of your current, broken implementation.Greengage
"should be -1.233456 and -2" should be "should be -1.233456 and -7", right?Entangle
I don't think so. -1.233456e-7 can also be represented as -0.01233456Stratiform
A
86

Google on "scientific notation regexp" shows a number of matches, including this one (don't use it!!!!) which uses

*** warning: questionable ***
/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/

which includes cases such as -.5e7 and +00000e33 (both of which you may not want to allow).

Instead, I would highly recommend you use the syntax on Doug Crockford's JSON website which explicitly documents what constitutes a number in JSON. Here's the corresponding syntax diagram taken from that page:

alt text
(source: json.org)

If you look at line 456 of his json2.js script (safe conversion to/from JSON in javascript), you'll see this portion of a regexp:

/-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/

which, ironically, doesn't match his syntax diagram.... (looks like I should file a bug) I believe a regexp that does implement that syntax diagram is this one:

/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

and if you want to allow an initial + as well, you get:

/[+\-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

Add capturing parentheses to your liking.

I would also highly recommend you flesh out a bunch of test cases, to ensure you include those possibilities you want to include (or not include), such as:

allowed:
+3
3.2e23
-4.70e+9
-.2E-4
-7.6603

not allowed:
+0003   (leading zeros)
37.e88  (dot before the e)

Good luck!

Alar answered 18/3, 2009 at 15:7 Comment(7)
...? Just use the regexp/diagram shown in the JSON website.Alar
Then why don't you try the previous regex, the one before the statement "and if you want to allow an initial + as well"?Alar
I know this is a very old forum but wanted to point something out. It looks like your pattern allows for this type of entry 'e324ewfg' which obviously is not a valid number.Bugleweed
the regexps posted do not include ^ at the beginning or $ at the end which would prevent those, and should be used if the match is only a number; but some uses of regexps are in larger patterns.Alar
haha... arg... i thought this would be simpler. this is for the most general case tho.Enplane
Nice - However it does allow for the . before the e, or even ending on a . The \d* after the . needs to be a \d+. Basically if there is a . there has to be a digit.Pilch
@GerardONeill huh -- you are correct; I wonder why it took 12 years for someone to catch my mistake in transcribing the syntax diagram into regexp notation :-)Alar
R
5

Building off of the highest rated answer, I modified the regex slightly to be /^[+\-]?(?=.)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/.

The benefits this provides are:

  1. allows matching numbers like .9 (I made the (?:0|[1-9]\d*) optional with ?)
  2. prevents matching just the operator at the beginning and prevents matching zero-length strings (uses lookahead, (?=.))
  3. prevents matching e9 because it requires the \d before the scientific notation

My goal in this is to use it for capturing significant figures and doing significant math. So I'm also going to slice it up with capturing groups like so: /^[+\-]?(?=.)(0|[1-9]\d*)?(\.\d*)?(?:(\d)[eE][+\-]?\d+)?$/.

An explanation of how to get significant figures from this:

  1. The entire capture is the number you can hand to parseFloat()
  2. Matches 1-3 will show up as undefined or strings, so combining them (replace undefined's with '') should give the original number from which significant figures can be extracted.

This regex also prevents matching left-padded zeros, which JavaScript sometimes accepts but which I have seen cause issues and which adds nothing to significant figures, so I see preventing left-padded zeros as a benefit (especially in forms). However, I'm sure the regex could be modified to gobble up left-padded zeros.

Another problem I see with this regex is it won't match 90.e9 or other such numbers. However, I find this or similar matches highly unlikely as it is the convention in scientific notation to avoid such numbers. Though you can enter it in JavaScript, you can just as easily enter 9.0e10 and achieve the same significant figures.

UPDATE

In my testing, I also caught the error that it could match '.'. So the look-ahead should be modified to (?=\.\d|\d) which leads to the final regex:

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/
Raft answered 10/8, 2018 at 16:25 Comment(0)
N
3

Building on @Troy Weber, I would suggest

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d+)?(?:(?<=\d)(?:[eE][+\-]?\d+))?$/

to avoid matching 3., per @Jason S's rules

Neology answered 17/2, 2022 at 18:38 Comment(0)
G
2

Here is some Perl code I just hacked together quickly.

my($sign,$coeffl,$coeffr,$exp) = $str =~ /^\s*([-+])?(\d+)(\.\d*)?e([-+]?\d+)\s*$/;

my $shift = length $coeffl;
$shift = 0 if $shift == 1;

my $coeff =
  substr( $coeffl, 0, 1 );

if( $shift || $coeffr ){
  $coeff .=
    '.'.
    substr( $coeffl, 1 );
}

$coeff .= substr( $coeffr, 1 ) if $coeffr;

$coeff = $sign . $coeff if $sign;

$exp += $shift;

say "coeff: $coeff exponent: $exp";
Greengage answered 18/3, 2009 at 4:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.