What perl regex would validate a string representing a bowling score?
Asked Answered
C

6

5

I have string that represents a bowling score:

X-91-55-72-X-X-X-90-82-91X

I want to use perl regex to validate this string is valid. I want to check the following:

  1. The frames are delimited by a dash (-). There must be 10 frames.

  2. Frame 1 to 9 must be either a strike (single X) or two digits adding up to no more than 10.

  3. Frame 10 is a lot different. Some valid values are as follows:

    • XXX - Three strikes
    • XX7 - Two strikes and a digit of 0 to 9
    • X91 - A strike and two digits adding up to no more than 10
    • 90 - Two digits adding up to less than 10
    • 91X - Two digits adding up to 10 and then a strike
    • 917 - Two digits adding up to 10 follow by a digit of 0-9

While I can write a crap-ton of code to test each scenario, I'm sure there's probably one or two regular expressions that would validate the string. My brain simply can not wrap itself around a regex solution. Any help is appreciated.

The following code verifies the valid characters in the string:

if ($gs =~ /\A[X,0-9,-]+\z/) { $valid = 1; }

I just don't know about validating the frames

Courier answered 15/7, 2023 at 18:50 Comment(5)
Regex is rarely a good place to check for arithmetic related stuff, like "adding up to no more than 10". Consider splitting by - and checking separate scores in your code.Cason
Hm, how is your score expected to handle spares with first throw of 0?Cason
@bobblebubble, I wouldn't hurry with any attempts, because OP seemingly missing one case with their notation of score, and it is very much possible, once they realize that, this question will be either modified heavily or deleted altogether.Cason
Surprisingly enough, regex for standard bowling score notation (with / for spare) is quite easy: ^(?:(?:X|(\d\/|0\d|1[0-8]|2[0-7]|3[0-6]|4[0-5]|5[0-4]|6[0-3]|7[0-2]|8[01]|90))-){9}(?:XX[X\d]|X(?1)|\d\/[X\d])$Cason
I am not the original author of the code, it was part of an Agile course. I will notify the instructor of the gutter/spare omission. Thanks for all the responses!Courier
B
4

This is probably not very elegant, but what it's worth, here's a snippet that might meet your needs:

use constant RE_LT10      => '(?:0[0-9]|1[0-8]|2[0-7]|3[0-6]|4[0-5]|5[0-4]|6[0-3]|7[0-2]|8[0-1]|90)';

use constant RE_MAX10     => '(?:0[0-9]|1[0-9]|2[0-8]|3[0-7]|4[0-6]|5[0-5]|6[0-4]|7[0-3]|8[0-2]|9[01])';

use constant RE_IS10      => '(?:19|28|37|46|55|64|73|82|91)';

use constant RE_LASTFRAME => "(?:XXX|XX[0-9]|X${\RE_MAX10}|${\RE_LT10}|${\RE_IS10}X|${\RE_IS10}\[0-9\])";

sub is_valid
 {
 return $_[0] =~ m/^(?:(?:X|${\RE_MAX10})-){9}${\RE_LASTFRAME}\z/;
 }

my $scores = 'X-91-55-72-X-X-X-90-82-91X';
say "\$scores is $scores";
say is_valid($scores) ?1:0;

Edited my answer to take into account potential "0[0-9]" scores, as per markalex' remark.

Bearcat answered 15/7, 2023 at 20:15 Comment(4)
Then your pattern doesn't allow for spares after first gutter. But it feels like OP forgot about it in their score format too.Cason
Aren't these numbers all short by 1? OP said "no more than 10", not "less than 10".Lenes
@Lenes "Two digits adding up to less than 10"Bearcat
@Grobu, I posted a variation of yours here.Vitellus
S
4

One could use a single regular expression to validate the string, but I would not recommend it, for the following reasons (when compared to writing high-level language code):

  • it would be more difficult to comprehend;
  • unit testing would be less straightforward;
  • any errors that initially survive unit testing would be harder to spot; and
  • it would be more difficult to maintain should there be changes in the specification.

This does not mean that the code should be devoid of regular expressions. I will write example code in Ruby, which should be easily converted to Perl.


It is convenient to first create a helper method that sums the first two digits in a string.

def sum_of_two_digits(s)
  s[/\d{2}/].each_char.sum(&:to_i)
end

/\d{2}/ is a regular expression that matches two consecutive digits in a string and s[/\d{2}/] returns the string comprised of the first two consecutive digits in the string s.

We can test this with a few strings for which it is needed.

sum_of_two_digits('12')  #=> 3
sum_of_two_digits('X12') #=> 3
sum_of_two_digits('12X') #=> 3
sum_of_two_digits('123') #=> 3

We will need a method that determines whether the scoring for each of the first nine frames is valid.

# The string (`s`) for each of the first nine frames must be one of the following:
# 'X'
# 'dd' (two digits whose sum does not exceed 10)

def one_first9_valid?(s)
  return true if s == 'X'
  s.match?(/\A\d{2}\z/) && sum_of_two_digits(s) <= 10
end

The regular expression reads as follows: "match the beginning of the string (\A) followed by two digits (\d{2}, same as \d\d) followed by the end of the string (\z)". s.each_char.sum(&:to_i) sums each character of the string after it has been converted to an integer.

We can try some examples.

one_first9_valid?('12')   #=> true
one_first9_valid?('00')   #=> true
one_first9_valid?('01')   #=> true
one_first9_valid?('1')    #=> false
one_first9_valid?('67')   #=> false
one_first9_valid?('123')  #=> false
one_first9_valid?('X1')   #=> false
one_first9_valid?('-1')   #=> false
one_first9_valid?('-12')  #=> false

We will also need a method that determines whether the scoring for Frame 10 is valid.

# The string (`s`) for Frame 10 must be one of the following:
# 'XXX'
# 'XXd' ('XX' followed by any digit)
# 'Xdd' ('X' followed by 2 digits whose sum does not exceed 10)
# 'dd'  (2 digits whose sum does not exceed 10)
# 'ddX' ('X' preceded by 2 digits that sum to 10)
# 'ddd' (2 digits that sum to 10 followed by any digit)

def last_valid?(s)
  case s
  when /\A(?:XXX|XX\d)\z/
    true
  when /\AX\d{2}\z/
    sum_of_two_digits(s) <= 10
  when /\A\d{2}\z/
    sum_of_two_digits(s) < 10
  when /\A(?:\d{2}X|\d{3})\z/
    sum_of_two_digits(s) == 10
  else
    false
  end
end

All four regular expressions require that the beginning (the anchor \A) and end (the anchor \z) of the string must be matched. (?:XXX|XX\d) is a non-capture group that requires the literal 'XXX' be matched or (|) the literal 'XX' followed by a digit (\d) be matched. The regular expression /\AX\d{2}\z/ requires that 'X' followed by 2 digits (\d{2}) be matched and, because of the anchors, that the string contain no other characters. The other two regular expressions are similar. Notice that when the string s is comprised of three digits sum_of_two_digits(s) returns the sum of the first two digits.

Here are some test results.

last_valid?('XXX')  #=> true
last_valid?('XX1')  #=> true
last_valid?('X12')  #=> true
last_valid?('12')   #=> true
last_valid?('82X')  #=> true
last_valid?('194')  #=> true
last_valid?('XXXX') #=> false
last_valid?('XX')   #=> false
last_valid?('X1')   #=> false
last_valid?('X123') #=> false
last_valid?('1')    #=> false
last_valid?('1234') #=> false
last_valid?('Y12')  #=> false
last_valid?('X78')  #=> false
last_valid?('76')   #=> false
last_valid?('76X')  #=> false
last_valid?('470')  #=> false

We can now put everything together. I will add two puts statements to identify invalid substrings for the reader.

def valid?(str)
  *first9, last = str.split('-')
  
  if (first9.size != 9)
    puts "Number of delimiters (#{first9.size}) incorrect"
    return false
  end

  first9.each do |s|
    unless (one_first9_valid?(s))
      puts "#{s} in first 9 is invalid"
      return false
    end
  end

  unless last_valid?(last)
    puts "last = #{last} is not valid"
    return false
  end

  true
end

str.split('-') splits the string on hyphens, returning an array. *first9, last = str.split('-') assigns the last element of that array to the variable last and all elements but the last to the array first9.

We may now test various strings.

valid? "X-91-55-72-X-X-X-90-82-XXX"  #=> true
valid? "X-91-55-72-X-X-X-90-82-XX7"  #=> true
valid? "X-91-55-72-X-X-X-90-82-X91"  #=> true
valid? "X-91-55-72-X-X-X-90-82-90"   #=> true
valid? "X-91-55-72-X-X-X-90-82-X91"  #=> true
valid? "X-91-55-72-X-X-X-90-82-917"  #=> true
valid? "X-91-72-X-X-X-90-82-917"     #=> false
  Number of delimiters (8) incorrect
valid? "X-91-55-7-X-X-X-90-82-91X"   #=> false
  7 in first 9 is invalid
valid? "X-91-55-72-X-X-X-X94-82-91X" #=> false
  X94 in first 9 is invalid
valid? "X-91-55-72-X-X-X-90-82-X"    #=> false
  last = X is not valid
valid? "X-91-55-72-X-X-X-90-82-XX"   #=> false
  last = XX is not valid
valid? "X-91-55-72-X-X-X-90-82-XXXX" #=> false
  last = XXXX is not valid 
valid? "X-91-55-72-X-X-X-90-82-X92"  #=> false
  last = X92 is not valid
valid? "X-91-55-72-X-X-X-90-82-91"   #=> false
  last = 91 is not valid
valid? "X-91-55-72-X-X-X-90-82-92X"  #=> false
  last = 92X is not valid
valid? "X-91-55-72-X-X-X-90-82-927"  #=> false
  last = 927 is not valid
Silverstein answered 16/7, 2023 at 5:31 Comment(0)
L
3

Break the score into frames, and then it is easy to check first 9 and the different last one

use warnings;
use strict;
use feature 'say';

my $score = shift // q(X-91-55-72-X-X-X-90-82-91X);

my @fr = split /-/, $score;

die "There must be ten frames (not: @fr)" if @fr != 10; 

for (@fr[0..8]) {
   say "Bad frame: $_" 
       if not /^X|([0-9])([0-9])$/ 
          or  ($1 and $2 and $1 + $2 > 10);
}

if ( $fr[-1] =~ /^(?: XXX | XX[0-9] | X([0-9])([0-9]) | ([0-9])([0-9]) [0-9X] )$/x ) { 
    say "Bad frame: $fr[-1]" 
        if ($1 and $2 and $1+$2 > 10) or ($3 and $4 and $3+$4 != 10);
}
elsif ( $fr[-1] =~ /^([0-9])([0-9])$/ ) { 
    say "Bad frame: $fr[-1]" if $1+$2 >= 10
}
else { say "Bad frame: $fr[-1]" }  # bad format entirely

For the last frame with its many conditions I find it least messy to separate different cases, where numbers add up to either 10 or less-than 10 (when only two). These can be combined but the whole thing is messy enough as it is.


If you don't need to print which exact frame fails can use a (core) library instead of an explicit for loop for the first part (1-9 frames)

use List::Util qw(any);

say "Bad frame(s) in @fr[0..8]" 
    if any { not /^X|([0-9])([0-9])$/ 
             or  ($1 and $2 and $1 + $2 > 10) 
       } @fr[0..8];
Leena answered 16/7, 2023 at 6:19 Comment(2)
yes third throw in tenth frame is only allowed (and in fact is required) if first was X or if sum of first two is 10.Cason
@Cason OK, will rewrite that last one -- thank you for comments!Leena
G
1

Added 8/29/2023: Prepending a pure regex solution constructed using the basics
of the rules.

use strict;
use warnings;

if ( 'X-91-55-72-X-X-X-90-82-73X' =~ /
 ^(?:(?:X|[01][0-9]|2[0-8]|3[0-7]|4[0-6]|5[0-5]|6[0-4]|7[0-3]|8[0-2]|9[0-1])-){9}(?:X(?:X[X0-9]|[01][0-9]|2[0-8]|3[0-7]|4[0-6]|5[0-5]|6[0-4]|7[0-3]|8[0-2]|9[0-1])|[01][0-8]|2[0-7]|3[0-6]|4[0-5]|5[0-4]|6[0-3]|7[0-2]|8[0-1]|9[0]|(?:19|28|37|46|55|64|73|82|91)[X0-9])$
/x )
{  print "Valid Bowling page\n";  }
else
{  print "Bowling Page is in Error !\n";  }

In case that regex is a little too compact, here is the expanded, explained regex.
This is basically the same intent as the others and compatible with most engines.

^ 
# Frames 1-9
(?:
   
  # Strike or
  # Two digits adding up to 10 or less
   (?:
      X
    | [01] [0-9] 
    | 2 [0-8] 
    | 3 [0-7]    
    | 4 [0-6] 
    | 5 [0-5] 
    | 6 [0-4] 
    | 7 [0-3] 
    | 8 [0-2] 
    | 9 [0-1] 
   )
   -
){9}

# Frame 10
(?:
   
# XXX - Three strikes    or
# XX7 - Two strikes and one digit 0-9    or
# X91 - Strike and two digits adding up to 10 or less    or
   X
   (?:
      X [X0-9] 
    | [01] [0-9] 
    | 2 [0-8] 
    | 3 [0-7]    
    | 4 [0-6] 
    | 5 [0-5] 
    | 6 [0-4] 
    | 7 [0-3] 
    | 8 [0-2] 
    | 9 [0-1] 
   )
   
# 90 - Two digits adding up to less than 10    or
 | [01] [0-8] 
 | 2 [0-7] 
 | 3 [0-6]    
 | 4 [0-5] 
 | 5 [0-4] 
 | 6 [0-3] 
 | 7 [0-2] 
 | 8 [0-1] 
 | 9 [0] 
   
# 91X - Two digits adding up to 10 then a strike    or
# 917 - Two digits adding up to 10 follow by a digit of 0-9
   
 | (?:
      19
    | 28
    | 37
    | 46
    | 55
    | 64
    | 73
    | 82
    | 91 
   )
   [X0-9] 
)
$

We can also try Code Constructs for some inline addition.
I believe all the conditions are covered.

8/28/2023: Mod for the obsessive ASCIInarian [\x00-\x7e]

use strict;
use warnings;

if ( 'X-91-55-72-X-X-X-90-82-73X' =~ /
 ^ 
 (?=[\x00-\x7e]*$)
 # Frames 1-9
 (?:
    (?:
       X
     | 
       ( \d )                        # (1)
       ( \d )                        # (2)
       (??{ '(?!)' if ($1 + $2) > 10 })  # Greater than 10 ? Fail
    )
    -
 ){9}
 
 # Frame 10
 (?:
    X
    (?:
       X [X\d] 
     | ( \d )                        # (3)
       ( \d )                        # (4)
       (??{ '(?!)' if ($3 + $4) > 10 })  # Greater than 10 ? Fail
    )
  | 
    ( \d )                        # (5)
    ( \d )                        # (6)
    (?:
       (??{ '[X\d]' if ($5 + $6) == 10 })  # Equal to 10 ? [X\d]
     | (??{ '(?!)' if ($5 + $6) > 9 })     # Greater than 9 ? Fail
    )
 )
 $ 
/x )
{
   print "Valid Bowling page\n";
}
else {
   print "Bowling Page is in Error !\n";
}
 
Galitea answered 16/7, 2023 at 2:46 Comment(12)
Two small easily fixed bugs. You need /a to make \d equivalent to [0-9]. $ should be \z.Vitellus
(??{ '(*FAIL)' if ($3 + $4) > 10 }) requires compilation of the returned string each time this is evaluated. This can be avoided by using (?(?{ $3 + $4 > 10 }) (?*FAIL) ) instead.Vitellus
You have (\d) (?: (\d) ... | (\d) ... ) near the end. That can be simplified to (\d) (\d) (?: ... | ... ). In fact, that whole bit simplifies to (\d) (\d) (?(?{ $5 + $6 > 10 }) (*FAIL) ) (?(?{ $5 + $6 == 10 }) [X\d] ).Vitellus
There no issue with the logic of simplifying a(bc|bd) to ab(c|d).Vitellus
@Vitellus - Re: a(bc|bd) to ab(c|d) there may not be an issue to factor out the \d but as the way I see it, not only does this not enhance performance, its a 1 time check. And if refactor is your goal imgur.com/a/E0uFevr. I avoid Code Conditionals because they are the slowest constructs in regex, and yours packed 2 of them together with no chance to divert so both must be executed. The refactor version was updated but is frivolous. (\d)(\d)(?:(??{ '[X\d]' if ($5 + $6) == 10 })|(??{ '(?!)' if ($5 + $6) > 9 }))Galitea
Re "they are the slowest constructs in regex", Surely (??{ }) is even slower as it also prevents optimizations, requires the same Perl expression evaluation, plus it has the cost of compiling the returned regex pattern every evaluation! Still, I'm not concerned about that. It's the fact that you accept strings that contain Line Feed, "७" and "৯" that's important to fix.Vitellus
@Vitellus - Re: strings that contain Line Feed, "७" and "৯". Why would a single LF at the end be of concern? Baloney; And for the obsessed, I've made the non-numeric adders ७৯ moot with ASCIInarian. Hey where did my comments go ?Galitea
Re "Why would a single LF at the end be of concern?", How can you think that unwanted LFs won't cause problems? In any case, your answer simply doesn't do what the OP asked. /// Re "(?=[\x00-\x7e]*$)", oof! /a would have been sufficient, as previously mentioned. /// Re "Hey where did my comments go ?", Insults are not tolerated on SO.Vitellus
@Vitellus - You want to keep it technical then stop insisting things like unwanted LFs plural are even possible with ^..$ If that were so you'd have to say that every regex that uses $ is somehow invalid. Do you have some insider knowledge on this or is this your opinion. Re: oof! again, if (?=[\x00-\x7e]*$) does the same function, why use /a. Re: your answer simply doesn't do what the OP asked. Another criticism that means what exactly ?Galitea
omg, you're the most distasteful person alive. If you're going to nitpick, at least be right. We were talking about strings with a LF, so "LFs", is correct. And you're lying about asking again.Vitellus
Re "why use /a", simpler, easier to read, and more efficient.Vitellus
Re "Another criticism that means what exactly ?", No, the same criticism. You still don't properly validate. I listed two problems, and you only fixed one.Vitellus
S
1

One could attempt to match a single regular expression that employs a subroutine:

^(?=(?P<LE10>(?:0[0-9]|1[0-9]|2[0-8]|3[0-7]|4[0-6]|5[0-5]|6[0-4]|7[0-3]|8[0-2]|9[0-1])))?(?:(?:X|(?P>LE10))-){9}(?:XXX|XX\d|X?(?P>LE10)|(?:19|28|37|46|55|64|7|3|82|91)[X\d])$

Demo

Following the beginning of string anchor (^) an optional (? at the end) positive lookahead is used to define a named capture group LE10 (abbreviation for "less than or equal to 10"):

 ^           # match the beginning of the string
(?=          # begin positive lookahead
  (?P<LE10>  # begin capture group named LE10
    (?:      # begin non-capture group
      0[0-9] # match 0 followed by any digit
      |      # or
      1[0-9] # match 1 followed by any digit
      |      # or
      2[0-8] # match 2 followed by any digit in the range 0-8
      |3[0-7]|4[0-6]|5[0-5]|6[0-4]|7[0-3]|8[0-2]|9[0-1]
    )        # end non-capture group
  )          # end capture group LE10
)?           # end positive lookahead and make it optional

A positive lookahead has been used to avoid moving the pointer from the beginning of the string and making it optional is needed in case the string begins with an 'X', in which case there would be no match.

The remainder of the expression can then be executed, invoking the subroutine as needed.

(?:           # begin outer non-capture group 
  (?:         # begin inner non-capture group
    X         # match 'X'
    |         # or
    (?P>LE10) # match the code in subroutine LE10
  )           # end inner non-capture group
  -           # match a hyphen
)             # end outer non-capture group
{9}           # execute the outer non-capture group 9 times
(?:           # begin non-capture group
  XXX         # match 'XXX'
  |           # or
  XX\d        # match 'XX' followed by a digit
  |           # or
  X?          # optionally match 'X'
  (?P>LE10)   # match the code in subroutine LE10
  |           # or
  (?:         # begin non-capture group
    19        # match '19'
    |         # or
    28|37|46|55|64|7|3|82|91
  )           # end non-capture group
  [X\d]       # match 'X' or a digit
)             # end non-capture group
$             # match end of string

Note that, whereas (?P=LE10) would return the content of the named group LE10, (?P>LE10) executes the statements used to define that capture group, in effect, replacing (?P>LE10) with those statements.

Aside from greatly reducing the amount of code required, the use of subroutines reduces the chance of cut-and-paste errors and makes the expression easier to read and comprehend.

Silverstein answered 17/7, 2023 at 4:24 Comment(0)
V
1

This is @Grobu's answer modified to use subpatterns instead of a bunch of concatenated strings:

sub is_valid {
   return $_[0] =~ /
      ^ (?: (?&OTHER) - ){9} (?&LAST) \z

      (?(DEFINE)
         # Warning: LE10 is missing a way to denote 0+10.
         (?<LE10> 0[0-9]|1[0-9]|2[0-8]|3[0-7]|4[0-6]|5[0-5]|6[0-4]|7[0-3]|8[0-2]|9[01] )
         (?<EQ10> 19|28|37|46|55|64|73|82|91                                           )
         (?<LT10> 0[0-9]|1[0-8]|2[0-7]|3[0-6]|4[0-5]|5[0-4]|6[0-3]|7[0-2]|8[0-1]|90    )

         (?<OTHER> X | (?&LE10)                                     )
         (?<LAST>  XX[X0-9] | X(?&LE10) | (?&EQ10)[X0-9] | (?&LT10) )
      )
   /x;
}
Vitellus answered 18/7, 2023 at 16:38 Comment(8)
This is a whole lot more legible, awesome! I must admit my Perl skills are dated and rusty, I really have to study this extended patterns syntax.Bearcat
Why inconsistency in ^ and \z? And since you enabled only /x, why use \z at all?Cason
@markalex, Re "Why inconsistency in ^ and \z?", Are you asking why I used \z instead of $? Because I prefer code that works over "consistency". $ allows a trailing LF, whereas \z doesn't. /// Re "And since you enabled only /x, why use \z at a all?" Because I want to make sure there's no extra characters at the end of the string being matched. /x doesn't remove the need for that. /x allows whitespace to be used in the pattern.Vitellus
By mentioning "only using /x i meant that no other flags are used". And I assume that answer to my initial question was something like "perl has a quirk regarding matching strings with trailing slashes", don't see why it triggered you so much.Cason
@markalex, What makes you think something you said triggered me??? And I have no idea about what quirk or strings with backslashes you might be thinking about. As I said, \z matches the end of the string. And I wanted to match the end of the string to make sure there weren't unwanted characters. Nothing about quirks or backslashes in there. That's why I used \z. And /x has no effect on that.Vitellus
Hm, sorry thought about something else while typing. Meant "trailing newlines" of course. And by triggered I meant becoming arroganty and irritated seemingly expecting me to know this specific nuance of perl's regexes.Cason
@maraklex, I know what triggered means. And I don't expect you to know (though it wouldn't have hurt to read up on the things you were asking about). That's why I answered your questions.Vitellus
@maraklex, As for being a quirk, It's not. It's very common for $ usually to mean end of text (as opposed to end of string). See the table in here. javascript is an exception, but it holds for at least icu, java, .net, objective-c, pcre, perl, php, python, swift and ruby.Vitellus

© 2022 - 2024 — McMap. All rights reserved.