Representing identifiers using Regular Expression
Asked Answered
O

2

27

The regular definition for recognizing identifiers in C programming language is given by

letter -> a|b|...z|A|B|...|Z|_
digit -> 0|1|...|9
identifier -> letter(letter|digit)*

This definition will generate identifiers of the form

identifier: [_a-zA-Z][_a-zA-Z0-9]*

My question now is how do you limit the length of the identifier that can be generated to not more than 31 characters. What changes need to be made in the regular definition or how to write a regular expression to limit it to not more than the specified length. Could anyone please help. Thanks.

Oxy answered 19/2, 2013 at 9:19 Comment(1)
Side note, the original regex can be shortened by using negative lookahead and predefined character classes (?!\d)\w*Pawnbroker
P
39

The regular expression you are looking for is:

[_a-zA-Z][_a-zA-Z0-9]{0,30}

It will match an underscore or letter following by X underscores, letters or numbers, where 0 <= X <= 30

Photocathode answered 19/2, 2013 at 9:35 Comment(3)
I got it the moment the other two users gave their suggestions...thanks anyways.Oxy
@jerisalan ok. just placed my question since you asked on both answers "any possible way to change the regular definition to bring about the same change".Photocathode
Here {0,30} only restricts the length on [_a-zA-Z0-9]. The above regex means that 1 character from [_a-zA-Z] and atmost 30 characters from [_a-zA-Z0-9]Adames
B
0

Update: Updated regex such that identifier is not started with a digit.

To limit the length, {} are usually used.
For example, your regex was [_a-zA-Z0-9]+. Means, allow any alphanumeric values and underscore, and the length must be greater than equals to 1. If we want to limit it not to exceed 31 characters, we can rewrite the regex as:

[_a-zA-Z0-9]{1,31}

{1,31} indicates that this will accept alphanumeric values of length greater than equals to 1 and less than equals to 31.

However, the above regex also means that the identifier can start with a digit. Note that there are three ranges provided: a-z, A-Z, and 0-9. To limit the identifier to start with an alphabet or underscore followed by alphabet, digit or underscore, following regex can be used:

[_a-zA-Z][_a-zA-Z0-9]{0,30}

The first portion [_a-zA-Z] forces the identifier to start with a character or underscore. It also makes sure that the identifier is not empty. The remaining portion of the regex [_a-zA-Z0-9]{0-30} ensures that only characters, underscore and digits are accepted and that in addition to the first character, up to 30 more can be added to the identifier.

You can make respective changes to your regex.

Barramunda answered 19/2, 2013 at 9:27 Comment(5)
Thanks...any possible way to change the regular definition to bring about the same change.Oxy
identifier must not start with a digit and may include _.Corrianne
I agree. I have made respective changes to the answer.Barramunda
[a-zA-Z][a-zA-Z0-9]{0-30} you forgot about _(Underline) and that must be [a-zA-Z_][a-zA-Z0-9_]{0-30}Disfranchise
{0-30} should be {0, 30}Delorisdelorme

© 2022 - 2024 — McMap. All rights reserved.