The Roman digits come in two classes, the "ones" (I, X, C, M) and the "fives" (V,L, D). The "fives" can't be repeated and cannot be substracted. The "ones" can be repeated up to three times when they don't come after a smaller number and the can be subtracted from a number that isn't greater than the next "one".
When parsing, a digit can be in three different modes: It is either added normally or it is a number to be subtracted or it is a number from which the preceding number was substracted.
You can enforce these rules as you build your number. In addition to the value of a digit, you need a function that classifies the digit. In the code below, the function repeat
does this. It returns the maximum number of repetitions per number, but it also serves as classification: 3 means a "one" and 1 means a "five."
The code below seems to produce the same results as your code with regex validation. It returns a positive number for valid Roman numbers and −1 otherwise. (And it has fewer than 28 conditionals.)
int digit(int c)
{
if (c == 'I') return 1;
if (c == 'V') return 5;
if (c == 'X') return 10;
if (c == 'L') return 50;
if (c == 'C') return 100;
if (c == 'D') return 500;
if (c == 'M') return 1000;
return 0;
}
int repeat(int c)
{
if (c == 'I') return 3;
if (c == 'V') return 1;
if (c == 'X') return 3;
if (c == 'L') return 1;
if (c == 'C') return 3;
if (c == 'D') return 1;
if (c == 'M') return 3;
return 0;
}
int from_roman(const char *s)
{
int res = 0; // running result
int prev = 10000; // value of previous digit
if (s == NULL || *s == '\0') return -1;
while (*s) {
int c = *s++; // Roman digit
int count = 1; // count of consecutive numbers
int value = digit(c); // digit value
int max = repeat(c); // allowed repetitions
if (value == 0) return -1; // illegal Roman digit
while (*s == c) {
s++;
count++;
}
if (*s && digit(*s) > value) {
int next = digit(*s++);
if (max != 3) return -1; // can only subtract I, X, C
if (count > 1) return -1; // can only subtract once
if (next > 10 * value) return -1; // IM,ID, IC, IL etc. invalid
if (value * 10 > prev) return -1; // VIV, VIX etc. invalid
res += next - value;
} else {
if (count > max) return -1; // too many repetitions
if (value >= prev) return -1; // must decrease
res += count * value;
}
prev = value;
}
return res;
}
Edit: The first two drafts of my code had errors, which are now fixed.
Since the validation of correctness is done via a regular expression, another approach is to implement the regex directly, while at the same time calculating the value of the Roman number. Also, given how complicated it is to get the logic to the Roman numbers right, this might be the better approach.
An implementation of this approach could be:
/*
* Returns the length of the digit stretch and advances the pointer
*/
static int stretch(const char **s, int m, int max)
{
int n = 0;
while (n < max && **s == m) {
(*s)++;
n++;
}
return n;
}
/*
* Parses (I II III IV V VI VII VIII IX) for ones,
* tens and hundreds and advances the pointer.
*/
static int parse(const char **s, int x, int v, int i)
{
int res = 0;
if (**s == i && *(*s + 1) == x) {
res += 9;
*s += 2;
} else if (**s == i && *(*s + 1) == v) {
res += 4;
*s += 2;
} else {
res += stretch(s, v, 1) * 5;
res += stretch(s, i, 3);
}
return res;
}
/*
* Parse a Roman numeral according the the regex; -1 means failure
*/
int from_roman_regex(const char *s)
{
int res = 0;
if (s == NULL || *s == '\0') return -1;
res += stretch(&s, 'M', 3) * 1000;
res += parse(&s, 'M', 'D', 'C') * 100;
res += parse(&s, 'C', 'L', 'X') * 10;
res += parse(&s, 'X', 'V', 'I') * 1;
if (*s) return -1;
return res;
}
The stretch
function emulates regexes such as X{0,3}
; the parse
function emulates regexes such as (V?I{0,3}|IX|IV)
, but in addition to singalling matching success or failure, it evaluates it as Roman number.
The first approach tries to implement the rules of Roman numbers. This is a bit complicated, but has the advantage that one cound extend it easily to provide exact error messages if one wanted to. The second approach has the advantage that it matches the question's spec exactly: It does what the regex does.
I've tested all Roman numbers up to 3,999 and all combinations of up to 7 Roman digits. The two approaches above and the OP's approach – simple aritgmetic plus regex validation – yielded the same results for all cases.
int total ... n = 0; for (...) { int value ...; if (n < 3 && (value == 1000 || ...) { total...; n++;} else n = 0; ...}
. A very (rough) but brute force approach? Are there any restrictions on how you want to approach it? – Linguistif (100) .."C".. else if (200) .."CC".. else if (700) .."DCC"
, etc. – Linguist:)
– Linguist^(M{0,3})... I{0,3}|IX|IV)$
? If that is the case, then validation code needs to be a generic regex parser. Otherwise you need to define how the ruleset may change or is encoded. – Wesle