I have an array containing Roman numerals (as strings of course). Like this:
$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
I'd like to sort them according to the numeric values of these numerals, so the results should be something like:
$sorted_a = array('III', 'V', 'XIII', 'XIX', 'LII', 'MCCXCIV');
So my question is: what is the best way to sort an array of Roman numerals? I know how to use the array sorting functions of PHP, I'm interested in the logic that goes on inside the comparison function.
EDIT: For simplicity, I'm only looking for a way that deals with strings constructed of the basic numerals in a standard way (no CCCC
for example):
I, V, X, L, C, D, M
TEST RESULTS
I took the time to extensively test all the code examples that were posted. Two tests were taken, one with a random array of 20 Roman numerals, and a second with an array containing 4000 of those. Same machine, lot of iterations, an average time taken, and all this run several times. Of course this is nothing offical, just my own tests.
TEST WITH 20 NUMERALS:
- hakre, bazmegakapa - around 0.0005 s
- anemgyenge, Andrea, Dirk McQuickly - around 0.0010 s
- Joe Nelson - around 0.0050 s
- Rob Hruska - around 0.0100 s
TEST WITH 4000 NUMERALS:
- hakre, bazmegakapa - around 0.13 s
- anemgyenge - around 1.4 s
- Dirk McQuickly, Andrea - around 1.8 s
- Rob Hruska - around 2.8 s
- Joe Nelson - around 15 s (surprise, checked several more times)
I have a hard time awarding the bounty. hakre and I made the fastest versions, following the same route, but he made a variation of mine, which was previously based on borrible's idea. So I will accept hakre's solution, because that is the quickest and nicer than mine (IMO). But I will award the bounty to anemgyenge, because I love his version and a lot of effort seems to be put into it.
map/sort/map
idiom for in Perl. EG:@snums = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_ => roman($_) ] } @nums;
– Pious4
isIV
and notIIII
. A max of 1 "digit" can be subtracted at one place, so8
is 'VIII' and not 'IIX'. Other variations can be easily taken care of. – HeterographyIIII
is just fine: it means 4. Don’t invent this on your own without a lot of research. You also need Unicode awareness: ⓵ to understand things like U+2168Ⅸ
and its lowercase map of U+2178ⅸ
, both 9; ⓶ because you can’t otherwise have numbers greater than thanMMMM
for like U+2128ↁ
which means 5,000; ⓷ and for U+0304 COMBINING MACRON ABOVE, since"\x{2168}\x{304}"
isⅨ̄
which is 9,000. I would also count U+0305 COMBINING OVERLINE:"\x{2179}\x{305}"
=>ⅹ̅
=> 10,000, just like U+2182ↂ
. ʜ̅ᴛ̅ʜ̅ᴀ̅ʜ̅ᴀ̅ɴ̅ᴅ̅ – Pious