Regular expression to match numbers with or without commas and decimals in text
Asked Answered
G

11

128

I'm trying to locate and replace all numbers in a body of text. I've found a few example regex's, which almost solve the problem, but none are perfect yet. The problem I have is that the numbers in my text may or may not have decimals and commas. For example:

"The 5000 lb. fox jumped over a 99,999.99998713 foot fence."

The regex should return "5000" and "99,999.99998713". Examples I've found break-up the numbers on the comma or are limited to two decimal places. I'm starting to understand regex's enough to see why some examples are limited to two decimal places, but I haven't yet learned how to overcome it and also include the comma to get the entire sequence.

Here is my latest version:

[0-9]+(\.[0-9][0-9]?)?

Which returns, "5000", "99,99", "9.99", and "998713" for the above text.

Gasometry answered 6/5, 2011 at 21:14 Comment(6)
What programming language or regex flavor?Unshaped
It seems like almost every answer here makes the mistake of allowing things like .,.,. or 9,9,9,9 or 9,9.99.9. These regexes won't require numbers to be in the proper format, and at worst, will treat punctuation as numbers. There are some optional tweaks possible (e.g. whether to allow leading and trailing zeroes), but some of the answers I'm seeing are downright incorrect. I really don't like downvoting, especially on honest attempts, but I feel like the answers here need cleaning up. This is a common question and will definitely be asked again.Walkover
In case you don't know itbyet, take a look at regexpal.comClearstory
Sorry for the delay Matt. I'm using Adobe's ActionScript 3. I thought the regex behavior was the same as JavaScript, but I tested Justin's suggestion at regexpal.com and compared it to my Flash application's results and saw two different results both wrong.Gasometry
Should work this time, based on my own tests. Let me know if it still needs refinement.Walkover
The best answer, rather a research article is here: https://mcmap.net/q/102024/-decimal-or-numeric-values-in-regular-expression-validationTokoloshe
W
367

EDIT: Since this has gotten a lot of views, let me start by giving everybody what they Googled for:

#ALL THESE REQUIRE THE WHOLE STRING TO BE A NUMBER
#For numbers embedded in sentences, see discussion below

#### NUMBERS AND DECIMALS ONLY ####
#No commas allowed
#Pass: (1000.0), (001), (.001)
#Fail: (1,000.0)
^\d*\.?\d+$

#No commas allowed
#Can't start with "."
#Pass: (0.01)
#Fail: (.01)
^(\d+\.)?\d+$

#### CURRENCY ####
#No commas allowed
#"$" optional
#Can't start with "."
#Either 0 or 2 decimal digits
#Pass: ($1000), (1.00), ($0.11)
#Fail: ($1.0), (1.), ($1.000), ($.11)
^\$?\d+(\.\d{2})?$

#### COMMA-GROUPED ####
#Commas required between powers of 1,000
#Can't start with "."
#Pass: (1,000,000), (0.001)
#Fail: (1000000), (1,00,00,00), (.001)
^\d{1,3}(,\d{3})*(\.\d+)?$

#Commas required
#Cannot be empty
#Pass: (1,000.100), (.001)
#Fail: (1000), ()
^(?=.)(\d{1,3}(,\d{3})*)?(\.\d+)?$

#Commas optional as long as they're consistent
#Can't start with "."
#Pass: (1,000,000), (1000000)
#Fail: (10000,000), (1,00,00)
^(\d+|\d{1,3}(,\d{3})*)(\.\d+)?$

#### LEADING AND TRAILING ZEROES ####
#No commas allowed
#Can't start with "."
#No leading zeroes in integer part
#Pass: (1.00), (0.00)
#Fail: (001)
^([1-9]\d*|0)(\.\d+)?$

#No commas allowed
#Can't start with "."
#No trailing zeroes in decimal part
#Pass: (1), (0.1)
#Fail: (1.00), (0.1000)
^\d+(\.\d*[1-9])?$

Now that that's out of the way, most of the following is meant as commentary on how complex regex can get if you try to be clever with it, and why you should seek alternatives. Read at your own risk.


This is a very common task, but all the answers I see here so far will accept inputs that don't match your number format, such as ,111, 9,9,9, or even .,,.. That's simple enough to fix, even if the numbers are embedded in other text. IMHO anything that fails to pull 1,234.56 and 1234—and only those numbers—out of abc22 1,234.56 9.9.9.9 def 1234 is a wrong answer.

First of all, if you don't need to do this all in one regex, don't. A single regex for two different number formats is hard to maintain even when they aren't embedded in other text. What you should really do is split the whole thing on whitespace, then run two or three smaller regexes on the results. If that's not an option for you, keep reading.

Basic pattern

Considering the examples you've given, here's a simple regex that allows pretty much any integer or decimal in 0000 format and blocks everything else:

^\d*\.?\d+$

Here's one that requires 0,000 format:

^\d{1,3}(,\d{3})*(\.\d+)?$

Put them together, and commas become optional as long as they're consistent:

^(\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?)$

Embedded numbers

The patterns above require the entire input to be a number. You're looking for numbers embedded in text, so you have to loosen that part. On the other hand, you don't want it to see catch22 and think it's found the number 22. If you're using something with lookbehind support (like C#, .NET 4.0+), this is pretty easy: replace ^ with (?<!\S) and $ with (?!\S) and you're good to go:

(?<!\S)(\d*\.?\d+|\d{1,3}(,\d{3})*(\.\d+)?)(?!\S)

If you're working with JavaScript or Ruby or something, things start looking more complex:

(?:^|\s)(\d*\.?\d+|\d{1,3}(?:,\d{3})*(?:\.\d+)?)(?!\S)

You'll have to use capture groups; I can't think of an alternative without lookbehind support. The numbers you want will be in Group 1 (assuming the whole match is Group 0).

Validation and more complex rules

I think that covers your question, so if that's all you need, stop reading now. If you want to get fancier, things turn very complex very quickly. Depending on your situation, you may want to block any or all of the following:

  • Empty input
  • Leading zeroes (e.g. 000123)
  • Trailing zeroes (e.g. 1.2340000)
  • Decimals starting with the decimal point (e.g. .001 as opposed to 0.001)

Just for the hell of it, let's assume you want to block the first 3, but allow the last one. What should you do? I'll tell you what you should do, you should use a different regex for each rule and progressively narrow down your matches. But for the sake of the challenge, here's how you do it all in one giant pattern:

(?<!\S)(?=.)(0|([1-9](\d*|\d{0,2}(,\d{3})*)))?(\.\d*[1-9])?(?!\S)

And here's what it means:

(?<!\S) to (?!\S) #The whole match must be surrounded by either whitespace or line boundaries. So if you see something bogus like :;:9.:, ignore the 9.
(?=.)             #The whole thing can't be blank.

(                    #Rules for the integer part:
  0                  #1. The integer part could just be 0...
  |                  #
  [1-9]              #   ...otherwise, it can't have leading zeroes.
  (                  #
    \d*              #2. It could use no commas at all...
    |                #
    \d{0,2}(,\d{3})* #   ...or it could be comma-separated groups of 3 digits each.
  )                  # 
)?                   #3. Or there could be no integer part at all.

(       #Rules for the decimal part:
  \.    #1. It must start with a decimal point...
  \d*   #2. ...followed by a string of numeric digits only.
  [1-9] #3. It can't be just the decimal point, and it can't end in 0.
)?      #4. The whole decimal part is also optional. Remember, we checked at the beginning to make sure the whole thing wasn't blank.

Tested here: http://rextester.com/YPG96786

This will allow things like:

100,000
999.999
90.0009
1,000,023.999
0.111
.111
0

It will block things like:

1,1,1.111
000,001.111
999.
0.
111.110000
1.1.1.111
9.909,888

There are several ways to make this regex simpler and shorter, but understand that changing the pattern will loosen what it considers a number.

Since many regex engines (e.g. JavaScript and Ruby) don't support the negative lookbehind, the only way to do this correctly is with capture groups:

(?:^|\s)(?=.)((?:0|(?:[1-9](?:\d*|\d{0,2}(?:,\d{3})*)))?(?:\.\d*[1-9])?)(?!\S)

The numbers you're looking for will be in capture group 1.

Tested here: http://rubular.com/r/3HCSkndzhT

One final note

Obviously, this is a massive, complicated, nigh-unreadable regex. I enjoyed the challenge, but you should consider whether you really want to use this in a production environment. Instead of trying to do everything in one step, you could do it in two: a regex to catch anything that might be a number, then another one to weed out whatever isn't a number. Or you could do some basic processing, then use your language's built-in number parsing functions. Your choice.

Walkover answered 6/5, 2011 at 21:33 Comment(13)
This is a very good attempt, but there may be a problem with it - depending on the greediness of the engine, a given number may partially match two of the competing formats, instrad of singly matching the correct one - i.e. 5000 may yield 500 plus 0 That makes me a little skeptical of trying to cover too much with only a single expression, and that's why i gave a simpler answer with the caveat of possible false positives. At the end of the day, the stringency of the requirements should dictate the solution.Clearstory
@Clearstory - That's a fair point. It might work with the latest edit. BTW, your downvote wasn't from me, since you pointed out the potential 1,11,11 match.Walkover
I'm using ActionScript, which I believe behaves the same as JavaScript. Using the first pattern you recommended, I get the following results on my test string (For validation, I'm simply returning the matches wrapped in "<<[result]>>"): The<< 5>>000 lb. fox jumped over a<< 9>>9<<,9>><<99>><<.9>><<99>><<98>><<71>>3 foot fence.Gasometry
Well, one of your up votes is from me :) as I do think your answer is as thorough as it can get and you put work into it. Just to make this a content-comment, a note to the OP, the ?: at the start of groups is there so that they aren't returned autonomously in the result ('captured'), even though they contribute to the matching of the whole expression; each formatted number in the input matches the whole expression.Clearstory
@Michael and @Clearstory - See latest edit, which seems to work. This is one of those regex problems that are more difficult than they appear.Walkover
Wow! Justin you're amazing. I really appreciate all of your time and detail. It has been very educational. I had no idea that this was such a complex problem or even that there were different "flavors" of regex which all yield different results. The pattern you specified works well in JavaScript (regexpal.com), but apparently AS3 has it's own flavor or regex. I don't know exactly where it breaks down. So, I've had to settle on a less desirable multi-step solution using a blend of what I've learned here. I couldn't have gotten to a solution without your help. Thank you!Gasometry
I'm afraid your regex wouldn't work very well in India because in a string like Rs.3,00,00,000 it finds 4 numbers instead of just 1.Mirabel
@Mirabel - The final version would actually not match that at all, since it now requires whitespace or a line anchor at each end. But yes, you'd definitely need a different (but similar) pattern for the Indian numbering system.Walkover
Looking at those REs that you propose, Zawinski's Dictum comes to mind. The problems you're ending up with here are “understanding” and “maintenance”. A simpler RE and an additional validation step make it far easier to get the code right in the first place, and to keep it right thereafter.Suffumigate
@Donal - You're right about Zawinski's Dictum on this particular pattern. Notice I said I was assuming the most difficult requirements, with the idea of shortening it later. Normally I'd just go with something like \d{1,3}(,\d{3})*(\.\d+)?. The above is probably overkill, but underkill is worse on this IMHO.Walkover
I wanted to use this REGEXP for international auditing - but Indian accountants use a 'laks' system with commas inserted every 2 or 3 numbers apart. (US: $1,000,000.00 INDIA: $10,00,000.00) -- what would the regexp be to accept these numbers too?Bowline
@Xerus - Yes, I did. I ignored lots of other things, too. This was explicitly never meant to cover all possible numeric formats; I just included some examples just to make it more useful. It's far from an exhaustive list, and it's not meant as a substitute for learning regular expressions. For the record, all you need for negative numbers is a [+-]? at the beginning of any of these. If the reader had trouble with that, I would urge them not to use any of these until they feel more familiar with regex.Walkover
To make Regex a little more palatable, you can use the excellent RegexBuddy to both compose and debug your regular expressions. You'll still end up scratching your head a fair bit, but at least you'll have a tool that visually illustrates what's going on.Dignadignified
H
18

The regex below will match both numbers from your example.

\b\d[\d,.]*\b

It will return 5000 and 99,999.99998713 - matching your requirements.

Haber answered 6/5, 2011 at 21:30 Comment(5)
This will match the comma in this,that.Walkover
@Justin Morgan - you are correct, I did not test for that condition. Here is an updated version which will wrok for all the cases, except for a number starting with a comma or period. \b\d[\d,.]+\bHaber
Much better, but it will still allow 9....9 or 1,,,,X (although the X won't be included in the match).Walkover
By the way, \b\d[\d,.]*\b is close enough that if you edit your answer I'll remove the -1. It should be a * instead of a + though; \b\d[\d,.]+\b won't allow single-digit numbers.Walkover
@Justin Morgan - thanks for the insight. This question was definitely more complex than it appeared. I updated my answer based on your feedback - it makes sense.Haber
P
11

Some days ago, I worked on the problem of removing trailing zeros from the string of a number.

In the continuity of that problem, I find this one interesting because it widens the problem to numbers comprising commas.

I have taken the regex's pattern I had writen in that previous problem I worked on and I improved it in order that it can treat the numbers with commas as an answer for this problem.

I've been carried away with my enthusiasm and my liking of regexes. I don't know if the result fits exactly to the need expressed by Michael Prescott. I would be interested to know the points that are in excess or in lack in my regex, and to correct it to make it more suitable for you.

Now, after a long session of work on this regex, I have a sort of weight in the brain, so I'm not fresh enough to give a lot of explanation. If points are obscure, and if anybody may come to be interested enough, please, ask me.

The regex is built in order that it can detect the numbers expressed in scientific notation 2E10 or even 5,22,454.12E-00.0478 , removing unnecessary zeros in the two parts of such numbers too. If an exponent is equal to zero , the number is modified so that there is no more exponent.

I put some verification in the pattern so that some particular cases will not match, for exemple '12..57' won't match. But in ',111' the string '111' matches because the preceding comma is considered a comma not being in a number but a comma of sentence.

I think that the managing of commas should be improved, because it seems to me that there are only 2 digits between commas in Indian numbering. It won't be dificult to correct, I presume

Here after is a code demonstrating how my regex works. There are two functions, according if one wants the numbers '.1245' to be transformed in '0.1245' or not. I wouldn't be surprised if errors or unwanted matchings or unmatchings will remain for certain cases of number strings; then I'd like to know these cases to understand and correct the deficiency.

I apologize for this code written in Python, but regexes are trans-langage and I think everybody will be capable of undertsanding the reex's pattern

import re

regx = re.compile('(?<![\d.])(?!\.\.)(?<![\d.][eE][+-])(?<![\d.][eE])(?<!\d[.,])'
                  '' #---------------------------------
                  '([+-]?)'
                  '(?![\d,]*?\.[\d,]*?\.[\d,]*?)'
                  '(?:0|,(?=0)|(?<!\d),)*'
                  '(?:'
                  '((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
                  '|\.(0)'
                  '|((?<!\.)\.\d+?)'
                  '|([\d,]+\.\d+?))'
                  '0*'
                  '' #---------------------------------
                  '(?:'
                  '([eE][+-]?)(?:0|,(?=0))*'
                  '(?:'
                  '(?!0+(?=\D|\Z))((?:\d(?!\.[1-9])|,(?=\d))+)[.,]?'
                  '|((?<!\.)\.(?!0+(?=\D|\Z))\d+?)'
                  '|([\d,]+\.(?!0+(?=\D|\Z))\d+?))'
                  '0*'
                  ')?'
                  '' #---------------------------------
                  '(?![.,]?\d)')


def dzs_numbs(x,regx = regx): # ds = detect and zeros-shave
    if not regx.findall(x):
        yield ('No match,', 'No catched string,', 'No groups.')
    for mat in regx.finditer(x):
        yield (mat.group(), ''.join(mat.groups('')), mat.groups(''))

def dzs_numbs2(x,regx = regx): # ds = detect and zeros-shave
    if not regx.findall(x):
        yield ('No match,', 'No catched string,', 'No groups.')
    for mat in regx.finditer(x):
        yield (mat.group(),
               ''.join(('0' if n.startswith('.') else '')+n for n in mat.groups('')),
               mat.groups(''))

NS = ['  23456000and23456000. or23456000.000  00023456000 s000023456000.  000023456000.000 ',
      'arf 10000 sea10000.+10000.000  00010000-00010000. kant00010000.000 ',
      '  24:  24,  24.   24.000  24.000,   00024r 00024. blue 00024.000  ',
      '  8zoom8.  8.000  0008  0008. and0008.000  ',
      '  0   00000M0. = 000.  0.0  0.000    000.0   000.000   .000000   .0   ',
      '  .0000023456    .0000023456000   '
      '  .0005872    .0005872000   .00503   .00503000   ',
      '  .068    .0680000   .8   .8000  .123456123456    .123456123456000    ',
      '  .657   .657000   .45    .4500000   .7    .70000  0.0000023230000   000.0000023230000   ',
      '  0.0081000    0000.0081000  0.059000   0000.059000     ',
      '  0.78987400000 snow  00000.78987400000  0.4400000   00000.4400000   ',
      '  -0.5000  -0000.5000   0.90   000.90   0.7   000.7   ',
      '  2.6    00002.6   00002.60000  4.71   0004.71    0004.7100   ',
      '  23.49   00023.49   00023.490000  103.45   0000103.45   0000103.45000    ',
      '  10003.45067   000010003.45067   000010003.4506700 ',
      '  +15000.0012   +000015000.0012   +000015000.0012000    ',
      '  78000.89   000078000.89   000078000.89000    ',
      '  .0457e10   .0457000e10   00000.0457000e10  ',
      '   258e8   2580000e4   0000000002580000e4   ',
      '  0.782e10   0000.782e10   0000.7820000e10  ',
      '  1.23E2   0001.23E2  0001.2300000E2   ',
      '  432e-102  0000432e-102   004320000e-106   ',
      '  1.46e10and0001.46e10  0001.4600000e10   ',
      '  1.077e-300  0001.077e-300  0001.077000e-300   ',
      '  1.069e10   0001.069e10   0001.069000e10   ',
      '  105040.03e10  000105040.03e10  105040.0300e10    ',
      '  +286E000024.487900  -78.4500e.14500   .0140E789.  ',
      '  081,12.40E07,95.0120     0045,78,123.03500e-0.00  ',
      '  0096,78,473.0380e-0.    0008,78,373.066000E0.    0004512300.E0000  ',
      '  ..18000  25..00 36...77   2..8  ',
      '  3.8..9    .12500.     12.51.400  ',
      '  00099,111.8713000   -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must',
      '  00099,44,and   0000,099,88,44.bom',
      '00,000,00.587000  77,98,23,45.,  this,that ',
      '  ,111  145.20  +9,9,9  0012800  .,,.  1  100,000 ',
      '1,1,1.111  000,001.111   -999.  0.  111.110000  1.1.1.111  9.909,888']


for ch in NS:
    print 'string: '+repr(ch)
    for strmatch, modified, the_groups in dzs_numbs2(ch):
        print strmatch.rjust(20),'',modified,'',the_groups
    print

result

string: '  23456000and23456000. or23456000.000  00023456000 s000023456000.  000023456000.000 '
            23456000  23456000  ('', '23456000', '', '', '', '', '', '', '')
           23456000.  23456000  ('', '23456000', '', '', '', '', '', '', '')
        23456000.000  23456000  ('', '23456000', '', '', '', '', '', '', '')
         00023456000  23456000  ('', '23456000', '', '', '', '', '', '', '')
       000023456000.  23456000  ('', '23456000', '', '', '', '', '', '', '')
    000023456000.000  23456000  ('', '23456000', '', '', '', '', '', '', '')

string: 'arf 10000 sea10000.+10000.000  00010000-00010000. kant00010000.000 '
               10000  10000  ('', '10000', '', '', '', '', '', '', '')
              10000.  10000  ('', '10000', '', '', '', '', '', '', '')
           10000.000  10000  ('', '10000', '', '', '', '', '', '', '')
            00010000  10000  ('', '10000', '', '', '', '', '', '', '')
           00010000.  10000  ('', '10000', '', '', '', '', '', '', '')
        00010000.000  10000  ('', '10000', '', '', '', '', '', '', '')

string: '  24:  24,  24.   24.000  24.000,   00024r 00024. blue 00024.000  '
                  24  24  ('', '24', '', '', '', '', '', '', '')
                 24,  24  ('', '24', '', '', '', '', '', '', '')
                 24.  24  ('', '24', '', '', '', '', '', '', '')
              24.000  24  ('', '24', '', '', '', '', '', '', '')
              24.000  24  ('', '24', '', '', '', '', '', '', '')
               00024  24  ('', '24', '', '', '', '', '', '', '')
              00024.  24  ('', '24', '', '', '', '', '', '', '')
           00024.000  24  ('', '24', '', '', '', '', '', '', '')

string: '  8zoom8.  8.000  0008  0008. and0008.000  '
                   8  8  ('', '8', '', '', '', '', '', '', '')
                  8.  8  ('', '8', '', '', '', '', '', '', '')
               8.000  8  ('', '8', '', '', '', '', '', '', '')
                0008  8  ('', '8', '', '', '', '', '', '', '')
               0008.  8  ('', '8', '', '', '', '', '', '', '')
            0008.000  8  ('', '8', '', '', '', '', '', '', '')

string: '  0   00000M0. = 000.  0.0  0.000    000.0   000.000   .000000   .0   '
                   0  0  ('', '0', '', '', '', '', '', '', '')
               00000  0  ('', '0', '', '', '', '', '', '', '')
                  0.  0  ('', '0', '', '', '', '', '', '', '')
                000.  0  ('', '0', '', '', '', '', '', '', '')
                 0.0  0  ('', '', '0', '', '', '', '', '', '')
               0.000  0  ('', '', '0', '', '', '', '', '', '')
               000.0  0  ('', '', '0', '', '', '', '', '', '')
             000.000  0  ('', '', '0', '', '', '', '', '', '')
             .000000  0  ('', '', '0', '', '', '', '', '', '')
                  .0  0  ('', '', '0', '', '', '', '', '', '')

string: '  .0000023456    .0000023456000     .0005872    .0005872000   .00503   .00503000   '
         .0000023456  0.0000023456  ('', '', '', '.0000023456', '', '', '', '', '')
      .0000023456000  0.0000023456  ('', '', '', '.0000023456', '', '', '', '', '')
            .0005872  0.0005872  ('', '', '', '.0005872', '', '', '', '', '')
         .0005872000  0.0005872  ('', '', '', '.0005872', '', '', '', '', '')
              .00503  0.00503  ('', '', '', '.00503', '', '', '', '', '')
           .00503000  0.00503  ('', '', '', '.00503', '', '', '', '', '')

string: '  .068    .0680000   .8   .8000  .123456123456    .123456123456000    '
                .068  0.068  ('', '', '', '.068', '', '', '', '', '')
            .0680000  0.068  ('', '', '', '.068', '', '', '', '', '')
                  .8  0.8  ('', '', '', '.8', '', '', '', '', '')
               .8000  0.8  ('', '', '', '.8', '', '', '', '', '')
       .123456123456  0.123456123456  ('', '', '', '.123456123456', '', '', '', '', '')
    .123456123456000  0.123456123456  ('', '', '', '.123456123456', '', '', '', '', '')

string: '  .657   .657000   .45    .4500000   .7    .70000  0.0000023230000   000.0000023230000   '
                .657  0.657  ('', '', '', '.657', '', '', '', '', '')
             .657000  0.657  ('', '', '', '.657', '', '', '', '', '')
                 .45  0.45  ('', '', '', '.45', '', '', '', '', '')
            .4500000  0.45  ('', '', '', '.45', '', '', '', '', '')
                  .7  0.7  ('', '', '', '.7', '', '', '', '', '')
              .70000  0.7  ('', '', '', '.7', '', '', '', '', '')
     0.0000023230000  0.000002323  ('', '', '', '.000002323', '', '', '', '', '')
   000.0000023230000  0.000002323  ('', '', '', '.000002323', '', '', '', '', '')

string: '  0.0081000    0000.0081000  0.059000   0000.059000     '
           0.0081000  0.0081  ('', '', '', '.0081', '', '', '', '', '')
        0000.0081000  0.0081  ('', '', '', '.0081', '', '', '', '', '')
            0.059000  0.059  ('', '', '', '.059', '', '', '', '', '')
         0000.059000  0.059  ('', '', '', '.059', '', '', '', '', '')

string: '  0.78987400000 snow  00000.78987400000  0.4400000   00000.4400000   '
       0.78987400000  0.789874  ('', '', '', '.789874', '', '', '', '', '')
   00000.78987400000  0.789874  ('', '', '', '.789874', '', '', '', '', '')
           0.4400000  0.44  ('', '', '', '.44', '', '', '', '', '')
       00000.4400000  0.44  ('', '', '', '.44', '', '', '', '', '')

string: '  -0.5000  -0000.5000   0.90   000.90   0.7   000.7   '
             -0.5000  -0.5  ('-', '', '', '.5', '', '', '', '', '')
          -0000.5000  -0.5  ('-', '', '', '.5', '', '', '', '', '')
                0.90  0.9  ('', '', '', '.9', '', '', '', '', '')
              000.90  0.9  ('', '', '', '.9', '', '', '', '', '')
                 0.7  0.7  ('', '', '', '.7', '', '', '', '', '')
               000.7  0.7  ('', '', '', '.7', '', '', '', '', '')

string: '  2.6    00002.6   00002.60000  4.71   0004.71    0004.7100   '
                 2.6  2.6  ('', '', '', '', '2.6', '', '', '', '')
             00002.6  2.6  ('', '', '', '', '2.6', '', '', '', '')
         00002.60000  2.6  ('', '', '', '', '2.6', '', '', '', '')
                4.71  4.71  ('', '', '', '', '4.71', '', '', '', '')
             0004.71  4.71  ('', '', '', '', '4.71', '', '', '', '')
           0004.7100  4.71  ('', '', '', '', '4.71', '', '', '', '')

string: '  23.49   00023.49   00023.490000  103.45   0000103.45   0000103.45000    '
               23.49  23.49  ('', '', '', '', '23.49', '', '', '', '')
            00023.49  23.49  ('', '', '', '', '23.49', '', '', '', '')
        00023.490000  23.49  ('', '', '', '', '23.49', '', '', '', '')
              103.45  103.45  ('', '', '', '', '103.45', '', '', '', '')
          0000103.45  103.45  ('', '', '', '', '103.45', '', '', '', '')
       0000103.45000  103.45  ('', '', '', '', '103.45', '', '', '', '')

string: '  10003.45067   000010003.45067   000010003.4506700 '
         10003.45067  10003.45067  ('', '', '', '', '10003.45067', '', '', '', '')
     000010003.45067  10003.45067  ('', '', '', '', '10003.45067', '', '', '', '')
   000010003.4506700  10003.45067  ('', '', '', '', '10003.45067', '', '', '', '')

string: '  +15000.0012   +000015000.0012   +000015000.0012000    '
         +15000.0012  +15000.0012  ('+', '', '', '', '15000.0012', '', '', '', '')
     +000015000.0012  +15000.0012  ('+', '', '', '', '15000.0012', '', '', '', '')
  +000015000.0012000  +15000.0012  ('+', '', '', '', '15000.0012', '', '', '', '')

string: '  78000.89   000078000.89   000078000.89000    '
            78000.89  78000.89  ('', '', '', '', '78000.89', '', '', '', '')
        000078000.89  78000.89  ('', '', '', '', '78000.89', '', '', '', '')
     000078000.89000  78000.89  ('', '', '', '', '78000.89', '', '', '', '')

string: '  .0457e10   .0457000e10   00000.0457000e10  '
            .0457e10  0.0457e10  ('', '', '', '.0457', '', 'e', '10', '', '')
         .0457000e10  0.0457e10  ('', '', '', '.0457', '', 'e', '10', '', '')
    00000.0457000e10  0.0457e10  ('', '', '', '.0457', '', 'e', '10', '', '')

string: '   258e8   2580000e4   0000000002580000e4   '
               258e8  258e8  ('', '258', '', '', '', 'e', '8', '', '')
           2580000e4  2580000e4  ('', '2580000', '', '', '', 'e', '4', '', '')
  0000000002580000e4  2580000e4  ('', '2580000', '', '', '', 'e', '4', '', '')

string: '  0.782e10   0000.782e10   0000.7820000e10  '
            0.782e10  0.782e10  ('', '', '', '.782', '', 'e', '10', '', '')
         0000.782e10  0.782e10  ('', '', '', '.782', '', 'e', '10', '', '')
     0000.7820000e10  0.782e10  ('', '', '', '.782', '', 'e', '10', '', '')

string: '  1.23E2   0001.23E2  0001.2300000E2   '
              1.23E2  1.23E2  ('', '', '', '', '1.23', 'E', '2', '', '')
           0001.23E2  1.23E2  ('', '', '', '', '1.23', 'E', '2', '', '')
      0001.2300000E2  1.23E2  ('', '', '', '', '1.23', 'E', '2', '', '')

string: '  432e-102  0000432e-102   004320000e-106   '
            432e-102  432e-102  ('', '432', '', '', '', 'e-', '102', '', '')
        0000432e-102  432e-102  ('', '432', '', '', '', 'e-', '102', '', '')
      004320000e-106  4320000e-106  ('', '4320000', '', '', '', 'e-', '106', '', '')

string: '  1.46e10and0001.46e10  0001.4600000e10   '
             1.46e10  1.46e10  ('', '', '', '', '1.46', 'e', '10', '', '')
          0001.46e10  1.46e10  ('', '', '', '', '1.46', 'e', '10', '', '')
     0001.4600000e10  1.46e10  ('', '', '', '', '1.46', 'e', '10', '', '')

string: '  1.077e-300  0001.077e-300  0001.077000e-300   '
          1.077e-300  1.077e-300  ('', '', '', '', '1.077', 'e-', '300', '', '')
       0001.077e-300  1.077e-300  ('', '', '', '', '1.077', 'e-', '300', '', '')
    0001.077000e-300  1.077e-300  ('', '', '', '', '1.077', 'e-', '300', '', '')

string: '  1.069e10   0001.069e10   0001.069000e10   '
            1.069e10  1.069e10  ('', '', '', '', '1.069', 'e', '10', '', '')
         0001.069e10  1.069e10  ('', '', '', '', '1.069', 'e', '10', '', '')
      0001.069000e10  1.069e10  ('', '', '', '', '1.069', 'e', '10', '', '')

string: '  105040.03e10  000105040.03e10  105040.0300e10    '
        105040.03e10  105040.03e10  ('', '', '', '', '105040.03', 'e', '10', '', '')
     000105040.03e10  105040.03e10  ('', '', '', '', '105040.03', 'e', '10', '', '')
      105040.0300e10  105040.03e10  ('', '', '', '', '105040.03', 'e', '10', '', '')

string: '  +286E000024.487900  -78.4500e.14500   .0140E789.  '
  +286E000024.487900  +286E24.4879  ('+', '286', '', '', '', 'E', '', '', '24.4879')
     -78.4500e.14500  -78.45e0.145  ('-', '', '', '', '78.45', 'e', '', '.145', '')
          .0140E789.  0.014E789  ('', '', '', '.014', '', 'E', '789', '', '')

string: '  081,12.40E07,95.0120     0045,78,123.03500e-0.00  '
081,12.40E07,95.0120  81,12.4E7,95.012  ('', '', '', '', '81,12.4', 'E', '', '', '7,95.012')
   0045,78,123.03500  45,78,123.035  ('', '', '', '', '45,78,123.035', '', '', '', '')

string: '  0096,78,473.0380e-0.    0008,78,373.066000E0.    0004512300.E0000  '
    0096,78,473.0380  96,78,473.038  ('', '', '', '', '96,78,473.038', '', '', '', '')
  0008,78,373.066000  8,78,373.066  ('', '', '', '', '8,78,373.066', '', '', '', '')
         0004512300.  4512300  ('', '4512300', '', '', '', '', '', '', '')

string: '  ..18000  25..00 36...77   2..8  '
           No match,  No catched string,  No groups.

string: '  3.8..9    .12500.     12.51.400  '
           No match,  No catched string,  No groups.

string: '  00099,111.8713000   -0012,45,83,987.26+0.000,099,88,44.or00,00,00.00must'
   00099,111.8713000  99,111.8713  ('', '', '', '', '99,111.8713', '', '', '', '')
  -0012,45,83,987.26  -12,45,83,987.26  ('-', '', '', '', '12,45,83,987.26', '', '', '', '')
         00,00,00.00  0  ('', '', '0', '', '', '', '', '', '')

string: '  00099,44,and   0000,099,88,44.bom'
           00099,44,  99,44  ('', '99,44', '', '', '', '', '', '', '')
     0000,099,88,44.  99,88,44  ('', '99,88,44', '', '', '', '', '', '', '')

string: '00,000,00.587000  77,98,23,45.,  this,that '
    00,000,00.587000  0.587  ('', '', '', '.587', '', '', '', '', '')
        77,98,23,45.  77,98,23,45  ('', '77,98,23,45', '', '', '', '', '', '', '')

string: '  ,111  145.20  +9,9,9  0012800  .,,.  1  100,000 '
                ,111  111  ('', '111', '', '', '', '', '', '', '')
              145.20  145.2  ('', '', '', '', '145.2', '', '', '', '')
              +9,9,9  +9,9,9  ('+', '9,9,9', '', '', '', '', '', '', '')
             0012800  12800  ('', '12800', '', '', '', '', '', '', '')
                   1  1  ('', '1', '', '', '', '', '', '', '')
             100,000  100,000  ('', '100,000', '', '', '', '', '', '', '')

string: '1,1,1.111  000,001.111   -999.  0.  111.110000  1.1.1.111  9.909,888'
           1,1,1.111  1,1,1.111  ('', '', '', '', '1,1,1.111', '', '', '', '')
         000,001.111  1.111  ('', '', '', '', '1.111', '', '', '', '')
               -999.  -999  ('-', '999', '', '', '', '', '', '', '')
                  0.  0  ('', '0', '', '', '', '', '', '', '')
          111.110000  111.11  ('', '', '', '', '111.11', '', '', '', '')
Paradisiacal answered 8/5, 2011 at 18:41 Comment(1)
Would rather have an answer than a University dissertation.Shred
G
9
\d+(,\d+)*(\.\d+)?

This assumes that there is always at least one digit before or after any comma or decimal and also assumes that there is at most one decimal and that all the commas precede the decimal.

Gyrate answered 6/5, 2011 at 21:37 Comment(5)
This doesn't restrict the comma groups to the 3-digit format. It will accept 999999,9,9,9,9.Walkover
Although I should probably point out that this is closer to being correct than most of the others. Your -1 isn't from me.Walkover
This is the RE that I'd use, though with another validation step afterwards (possibly not with an RE); trying to do everything with one RE makes life much harder.Suffumigate
@Justin Morgan It wasn't clear that the comma was only accepted in 3-digit groups. But that's easily solved by changing (,\d+) to (,\d\d\d) I guess.Gyrate
EXACTLY what I needed. Not some long dissertation on what regex is and how it works on 20 different formats of numbers so I can prove that I need a stick to hold up my HUGE brain.Shred
C
3

Taking a certain liberty with the requirements, you're looking for

\d+([\d,]?\d)*(\.\d+)?

But notice this will match e.g. 11,11,1

Clearstory answered 6/5, 2011 at 21:37 Comment(2)
Out of curiosity, is there any reason you went with \d+([\d,]?\d)*(\.\d+)? instead of \d+(,\d+)*(\.\d+)?? I think they'd give equivalent matches, although the capture groups would be different.Walkover
Hi. No special reason, it was a hang over from starting off with a more complex expression in order not to match invalid formats.Clearstory
N
3

This regex:

(\d{1,3},\d{3}(,\d{3})*)(\.\d*)?|\d+\.?\d*

Matched every number in the string:

1 1.0 0.1 1.001 1,000 1,000,000 1000.1 1,000.1 1,323,444,000 1,999 1,222,455,666.0 1,244

Nummulite answered 28/4, 2016 at 15:25 Comment(0)
P
3
(,*[\d]+,*[\d]*)+

This would match any small or large number as following with or without comma

1
100
1,262
1,56,262
10,78,999
12,34,56,789

or

1
100
1262
156262
1078999
123456789
Paine answered 21/2, 2019 at 10:15 Comment(1)
Best and valid answer :)Central
E
2

Here's a regex:

(?:\d+)((\d{1,3})*([\,\ ]\d{3})*)(\.\d+)?

that accepts numbers:

  • without spaces and/or decimals, eg. 123456789, 123.123
  • with commas or spaces as thousands separator and/or decimals, eg. 123 456 789, 123 456 789.100, 123,456, 3,232,300,000.00

Tests: http://regexr.com/3h1a2

Eelgrass answered 23/10, 2017 at 13:33 Comment(1)
This works fine on regexr.com but in python re module it is not workingHummel
S
2

Here is my answer:

(\d+(,?.?))*
Shadbush answered 9/1, 2022 at 20:9 Comment(1)
I loved this answer! To cover both negative and positive I updated like this: [+\\-]?(\\d+(,?.?))*Milesmilesian
H
1

Here is another construction which starts with the simplest number format and then, in a non-overlapping way, progressively adds more complex number formats:

Java regep:

(\d)|([1-9]\d+)|(\.\d+)|(\d\.\d*)|([1-9]\d+\.\d*)|([1-9]\d{0,2}(,\d{3})+(\.\d*)?)

As a Java String (note the extra \ needed to escape to \ and . since \ and . have special meaning in a regexp when on their own):

String myregexp="(\\d)|([1-9]\\d+)|(\\.\\d+)|(\\d\\.\\d*)|([1-9]\\d+\\.\\d*)|([1-9]\\d{0,2}(,\\d{3})+(\\.\\d*)?)";   

Explanation:

  1. This regexp has the form A|B|C|D|E|F where A,B,C,D,E,F are themselves regexps that do not overlap. Generally, I find it easier to start with the simplest possible matches, A. If A misses matches you want, then create a B that is a minor modification of A and includes a bit more of what you want. Then, based on B, create a C that catches more, etc. I also find it easier to create regexps that don't overlap; it is easier to understand a regexp with 20 simple non-overlapping regexps connected with ORs rather than a few regexps with more complex matching. But, each to their own!

  2. A is (\d) and matches exactly one of 0,1,2,3,4,5,6,7,8,9 which can't be simpler!

  3. B is ([1-9]\d+) and only matches numbers with 2 or more digits, the first excluding 0 . B matches exactly one of 10,11,12,... B does not overlap A but is a small modification of A.

  4. C is (.\d+) and only matches a decimal followed by one or more digits. C matches exactly one of .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .00 .01 .02 ... . .23000 ... C allows trailing eros on the right which I prefer: if this is measurement data, the number of trailing zeros indicates the level of precision. If you don't want the trailing zeros on the right, change (.\d+) to (.\d*[1-9]) but this also excludes .0 which I think should be allowed. C is also a small modification of A.

  5. D is (\d.\d*) which is A plus decimals with trailing zeros on the right. D only matches a single digit, followed by a decimal, followed by zero or more digits. D matches 0. 0.0 0.1 0.2 ....0.01000...9. 9.0 9.1..0.0230000 .... 9.9999999999... If you want to exclude "0." then change D to (\d.\d+). If you want to exclude trailing zeros on the right, change D to (\d.\d*[1-9]) but this excludes 2.0 which I think should be included. D does not overlap A,B,or C.

  6. E is ([1-9]\d+.\d*) which is B plus decimals with trailing zeros on the right. If you want to exclude "13.", for example, then change E to ([1-9]\d+.\d+). E does not overlap A,B,C or D. E matches 10. 10.0 10.0100 .... 99.9999999999... Trailing zeros can be handled as in 4. and 5.

  7. F is ([1-9]\d{0,2}(,\d{3})+(.\d*)?) and only matches numbers with commas and possibly decimals allowing trailing zeros on the right. The first group ([1-9]\d{0,2}) matches a non-zero digit followed zero, one or two more digits. The second group (,\d{3})+ matches a 4 character group (a comma followed by exactly three digits) and this group can match one or more times (no matches means no commas!). Finally, (.\d*)? matches nothing, or matches . by itself, or matches a decimal . followed by any number of digits, possibly none. Again, to exclude things like "1,111.", change (.\d*) to (.\d+). Trailing zeros can be handled as in 4. or 5. F does not overlap A,B,C,D, or E. I couldn't think of an easier regexp for F.

Let me know if you are interested and I can edit above to handle the trailing zeros on the right as desired.

Here is what matches regexp and what does not:

0
1
02 <- invalid
20
22
003 <- invalid
030 <- invalid
300
033 <- invalid
303
330
333
0004 <- invalid
0040 <- invalid
0400 <- invalid
4000
0044 <- invalid
0404 <- invalid
0440 <- invalid
4004
4040
4400
0444 <- invalid
4044
4404
4440
4444
00005 <- invalid
00050 <- invalid
00500 <- invalid
05000 <- invalid
50000
00055 <- invalid
00505 <- invalid
00550 <- invalid
05050 <- invalid
05500 <- invalid
50500
55000
00555 <- invalid
05055 <- invalid
05505 <- invalid
05550 <- invalid
50550
55050
55500
. <- invalid
.. <- invalid
.0
0.
.1
1.
.00
0.0
00. <- invalid
.02
0.2
02. <- invalid
.20
2.0
20.
.22
2.2
22.
.000
0.00
00.0 <- invalid
000. <- invalid
.003
0.03
00.3 <- invalid
003. <- invalid
.030
0.30
03.0 <- invalid
030. <- invalid
.033
0.33
03.3 <- invalid
033. <- invalid
.303
3.03
30.3
303.
.333
3.33
33.3
333.
.0000
0.000
00.00 <- invalid
000.0 <- invalid
0000. <- invalid
.0004
0.0004
00.04 <- invalid
000.4 <- invalid
0004. <- invalid
.0044
0.044
00.44 <- invalid
004.4 <- invalid
0044. <- invalid
.0404
0.404
04.04 <- invalid
040.4 <- invalid
0404. <- invalid
.0444
0.444
04.44 <- invalid
044.4 <- invalid
0444. <- invalid
.4444
4.444
44.44
444.4
4444.
.00000
0.0000
00.000 <- invalid
000.00 <- invalid
0000.0 <- invalid
00000. <- invalid
.00005
0.0005
00.005 <- invalid
000.05 <- invalid
0000.5 <- invalid
00005. <- invalid
.00055
0.0055
00.055 <- invalid
000.55 <- invalid
0005.5 <- invalid
00055. <- invalid
.00505
0.0505
00.505 <- invalid
005.05 <- invalid
0050.5 <- invalid
00505. <- invalid
.00550
0.0550
00.550 <- invalid
005.50 <- invalid
0055.0 <- invalid
00550. <- invalid
.05050
0.5050
05.050 <- invalid
050.50 <- invalid
0505.0 <- invalid
05050. <- invalid
.05500
0.5500
05.500 <- invalid
055.00 <- invalid
0550.0 <- invalid
05500. <- invalid
.50500
5.0500
50.500
505.00
5050.0
50500.
.55000
5.5000
55.000
550.00
5500.0
55000.
.00555
0.0555
00.555 <- invalid
005.55 <- invalid
0055.5 <- invalid
00555. <- invalid
.05055
0.5055
05.055 <- invalid
050.55 <- invalid
0505.5 <- invalid
05055. <- invalid
.05505
0.5505
05.505 <- invalid
055.05 <- invalid
0550.5 <- invalid
05505. <- invalid
.05550
0.5550
05.550 <- invalid
055.50 <- invalid
0555.0 <- invalid
05550. <- invalid
.50550
5.0550
50.550
505.50
5055.0
50550.
.55050
5.5050
55.050
550.50
5505.0
55050.
.55500
5.5500
55.500
555.00
5550.0
55500.
.05555
0.5555
05.555 <- invalid
055.55 <- invalid
0555.5 <- invalid
05555. <- invalid
.50555
5.0555
50.555
505.55
5055.5
50555.
.55055
5.5055
55.055
550.55
5505.5
55055.
.55505
5.5505
55.505
555.05
5550.5
55505.
.55550
5.5550
55.550
555.50
5555.0
55550.
.55555
5.5555
55.555
555.55
5555.5
55555.
, <- invalid
,, <- invalid
1, <- invalid
,1 <- invalid
22, <- invalid
2,2 <- invalid
,22 <- invalid
2,2, <- invalid
2,2, <- invalid
,22, <- invalid
333, <- invalid
33,3 <- invalid
3,33 <- invalid
,333 <- invalid
3,33, <- invalid
3,3,3 <- invalid
3,,33 <- invalid
,,333 <- invalid
4444, <- invalid
444,4 <- invalid
44,44 <- invalid
4,444
,4444 <- invalid
55555, <- invalid
5555,5 <- invalid
555,55 <- invalid
55,555
5,5555 <- invalid
,55555 <- invalid
666666, <- invalid
66666,6 <- invalid
6666,66 <- invalid
666,666
66,6666 <- invalid
6,66666 <- invalid
66,66,66 <- invalid
6,66,666 <- invalid
,666,666 <- invalid
1,111.
1,111.11
1,111.110
01,111.110 <- invalid
0,111.100 <- invalid
11,11. <- invalid
1,111,.11 <- invalid
1111.1,10 <- invalid
01111.11,0 <- invalid
0111.100, <- invalid
1,111,111.
1,111,111.11
1,111,111.110
01,111,111.110 <- invalid
0,111,111.100 <- invalid
1,111,111.
1,1111,11.11 <- invalid
11,111,11.110 <- invalid
01,11,1111.110 <- invalid
0,111111.100 <- invalid
0002,22.2230 <- invalid
.,5.,., <- invalid
2.0,345,345 <- invalid
2.334.456 <- invalid
Hilde answered 16/6, 2014 at 9:15 Comment(0)
F
1
\b\d+,

\b------->word boundary

\d+------>one or digit

,-------->containing commas,

Eg:

sddsgg 70,000 sdsfdsf fdgfdg70,00

sfsfsd 5,44,4343 5.7788,44 555

It will match:

70,

5,

44,

,44

Fy answered 1/11, 2018 at 4:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.