How to sort an array of Roman numerals?
Asked Answered
H

10

31

I have an array containing Roman numerals (as strings of course). Like this:

 $a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');

I'd like to sort them according to the numeric values of these numerals, so the results should be something like:

 $sorted_a = array('III', 'V', 'XIII', 'XIX', 'LII', 'MCCXCIV');

So my question is: what is the best way to sort an array of Roman numerals? I know how to use the array sorting functions of PHP, I'm interested in the logic that goes on inside the comparison function.

EDIT: For simplicity, I'm only looking for a way that deals with strings constructed of the basic numerals in a standard way (no CCCC for example):

I, V, X, L, C, D, M

TEST RESULTS

I took the time to extensively test all the code examples that were posted. Two tests were taken, one with a random array of 20 Roman numerals, and a second with an array containing 4000 of those. Same machine, lot of iterations, an average time taken, and all this run several times. Of course this is nothing offical, just my own tests.

TEST WITH 20 NUMERALS:

  1. hakre, bazmegakapa - around 0.0005 s
  2. anemgyenge, Andrea, Dirk McQuickly - around 0.0010 s
  3. Joe Nelson - around 0.0050 s
  4. Rob Hruska - around 0.0100 s

TEST WITH 4000 NUMERALS:

  1. hakre, bazmegakapa - around 0.13 s
  2. anemgyenge - around 1.4 s
  3. Dirk McQuickly, Andrea - around 1.8 s
  4. Rob Hruska - around 2.8 s
  5. Joe Nelson - around 15 s (surprise, checked several more times)

I have a hard time awarding the bounty. hakre and I made the fastest versions, following the same route, but he made a variation of mine, which was previously based on borrible's idea. So I will accept hakre's solution, because that is the quickest and nicer than mine (IMO). But I will award the bounty to anemgyenge, because I love his version and a lot of effort seems to be put into it.

Heterography answered 28/6, 2011 at 13:52 Comment(11)
The easiest way would probably be to have the comparison function translate them to decimal first. There's probably a function out there somewhere to help with that. Edit: You seem to have already acquired it :) - #6266096Jarrodjarrow
You obviously need a function that maps Roman numerals to their corresponding values. Are you asking whether such a function already exists, or do you want to know the mechanics behind sorting an array according to the computed values that derive from running such a function on each element?Pious
@Rob Yes, I already have the logic to translate Roman numerals to their integer values. One way is certainly to use that logic in the comparison function. I am brainstorming though because there might be an easier and quicker way (no need to transform every Roman numeral on every comparison, maybe only a part of them can be enough => not the whole Roman numeral).Heterography
Understood - I guess it might be useful to update your question indicating that you're looking for alternatives to simply converting the number in the comparison function (since it's the more obvious approach).Jarrodjarrow
@Rob Asking the question this way was intentional. On SO I have found no questions about sorting an array of Roman numerals, so I thought we should have one. If there is no optimized solution found, I will accept an answer with the obvious approach.Heterography
@bazmegakapa - Gotcha. FWIW, I like the question, and would also like to see some alternatives to simply converting the value.Jarrodjarrow
It seems like this might be a difficult, if not impossible, problem to solve, given the ambiguous nature of Roman numerals. Having read the last paragraphs here, which notations do you intend to support? You do mention that you're only looking for numerals constructed "in a standard way"; what are you planning on using for the "standard"?Jarrodjarrow
Ok, so this is nothing more than a "how do I sort an array of computed values" question, possibly with a quasi-interesting compute function. This is the kind of thing we use the map/sort/map idiom for in Perl. EG: @snums = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_ => roman($_) ] } @nums;Pious
@Rob Most of the time I see Roman numerals used, the same "digit" can be used max 3 times, so 4 is IV and not IIII. A max of 1 "digit" can be subtracted at one place, so 8 is 'VIII' and not 'IIX'. Other variations can be easily taken care of.Heterography
No, sorry, IIII is just fine: it means 4. Don’t invent this on your own without a lot of research. You also need Unicode awareness: ⓵ to understand things like U+2168 and its lowercase map of U+2178 , both 9; ⓶ because you can’t otherwise have numbers greater than than MMMM for like U+2128 which means 5,000; ⓷ and for U+0304 COMBINING MACRON ABOVE, since "\x{2168}\x{304}" is Ⅸ̄ which is 9,000. I would also count U+0305 COMBINING OVERLINE: "\x{2179}\x{305}" => ⅹ̅ => 10,000, just like U+2182 . ʜ̅ᴛ̅ʜ̅ᴀ̅ʜ̅ᴀ̅ɴ̅ᴅ̅Pious
@Pious I appreciate your approach. I have made a lot of research on Roman numerals as well. I did not try to be an authority on Roman numerals, I just stated that for this question I will accept an answer that satisfies the basic rules I outlined. I don't need a fully working Roman numeral library, I'm interested in the most optimal logic of sorting the basic array (if we have that, anyone can extend it for supporting further rules).Heterography
N
26

Picking your class to convert roman numbers to integers, a user-defined sort callback can handle this to sort the array:

$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');

$bool = usort($a, function($a, $b) {
    return RomanNumber::Roman2Int($a) - RomanNumber::Roman2Int($b);
});    
var_dump($a);

So here you find the logic inside the comparison function: if both values are of the same weight, return 0. If the first is lower than the second, return < 0 (e.g. -1), otherwise the second is larger than the first so return > 0 (e.g. 1).

Naturally any other type of function that returns the decimal value for a roman number would work as well.

Edit:

As you commented, you do not want to run the conversion for each pair. That's fine, with a help of an additional array which contains all converted values, you can run the sort on the decimal values and use that sorting on the roman numbers as well (Demo):

$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
$b = array_map('RomanNumber::Roman2Int', $a);
array_multisort($b, $a);
var_dump($a);

array_multisort PHP Manual does most of the magic here.

Nearby answered 28/6, 2011 at 14:4 Comment(8)
I'd put the ($a == $b) before the conversion calls (maybe duplicate it afterwards as well, if two different strings might have the same value), because if the strings are equal, you can spare 2 conversions.Heterography
My only problem with this solution is that conversion will run too many times... I mean unnecessarily too many times.Heterography
@bazmegakapa: See my edited answer, I've added a code example that makes use of array_multisort to use an array that contains the decimal numbers as the sort order. The conversion will run only once per one Roman Numeral then. Does spare the callback to the user function as well, as to "optimize" for ==.Nearby
Just need to use array_map() at the end to convert back to roman numerals.Livvi
@David Harkness: But then you're converting two times.Nearby
@Nearby - Oh sorry, I completely missed that you were calling multi_sort(). I hadn't needed to do that in PHP yet and didn't know it even existed. Thanks! :)Livvi
@David: Your idea is not bad either, better than with usort at least. With 6 elements and usort there are 11 pairs = 22 conversions. With conversion + re-conversion it's 12. Had not thought about that before your comment. So thanks for your comment ;)Nearby
And by multi_sort() I meant array_multisort().Livvi
T
10
function sortRomanNum($a, $b) {
    if($a == $b) return 0;

    $str = "0IVXLCDM";
    $len = 0;

    if(strlen($a) >= strlen($b)) {
        $len = strlen($a);
        $b .= str_repeat("0", $len - strlen($b));
    }
    else {
        $len = strlen($b);
        $a .= str_repeat("0", $len - strlen($a));
    }

    for($i = 0; $i < $len - 1; $i++) {
        $a1 = $a[$i]; $b1 = $b[$i]; $a2 = $a[$i+1]; $b2 = $b[$i+1];

        if( strpos($str, $a1.$b1.$a2) !== false ) return 1;
        if( strpos($str, $b1.$a1.$b2) !== false ) return -1;

        if($a1 != $b1) return strpos($str, $a1) > strpos($str, $b1) ? 1 : -1;
    }

    if($a[$i] != $b[$i]) return strpos($str, $a[$i]) > strpos($str, $b[$i]) ? 1 : -1;
}

Given two numbers (roman strings), $a and $b. If there are no substractions in the numbers (IV, IX, XC etc), then the solution would be trivial:

for all $i in $a and $b
    if $a[$i] > $b[$i] then return 1; //($a is greater then $b)
    if $a[$i] < $b[$i] then return 1; //($a is lower then $b)
return 0 //equality

Since there can be these special parts, the calculation is more complex. But the solution is to find the patterns:

a: IX | XC | CM
b: V  | L  | D

These are the only patterns which can mess up the trivial solution. If you find any of these, then $a will be greater then $b.

Note, that roman numbers don't include zeros, like the arabic ones. Therefore now we will use them (and basically put zeros where they are missing).

So here comes the function:

if $a == $b then return 0; //equality
create a string for ordering the roman numerals (strpos will give the right index)
define the length of the loop (take the longer string), and add zeros to the end of the shorter number
run the loop, and check:
    1. if the patterns above are found, return the comparision accordingly (1 or -1)
    2. otherwise do the trivial check (compare each numeral)
check the last numerals too.
Trivium answered 15/7, 2011 at 13:2 Comment(2)
Keep in mind that in roman numerals, there is no such definition that 9 must be written as IX. In fact, it is written as well as VIIII. Compare to 4: IV and IIII. - what I like with this answer is, that it's not using the array map for the operation and not using recursion to resolve it.Nearby
I played even more with this answers code. This one has far too less upvotes ;). It's really a smart solution and it's pretty stable. As for the 0 padding in front of $str and at the end of $a and $b I suggest there is no "need" for strict strpos checking and if $a and $b get one+ 0 at their end the end-check can be handled inside the for already. Just for playing.Nearby
R
4

Some people have suggested converting Roman numerals to integers, sorting, and mapping back. There is an easier way. All that we really need to do is compare any two arbitrary Roman numerals and let usort do the rest. Here is the code, and I will explain its design below.

$base = array( 'I' => 0, 'V' => 1, 'X' => 2, 'L' => 3,
               'C' => 4, 'D' => 5, 'M' => 6 ); 
function single($a) { global $base; return $base[$a]; }

function compare($a, $b) {
    global $base;
    if(strlen($a) == 0) { return true; }
    if(strlen($b) == 0) { return false; }
    $maxa = max(array_map('single', str_split($a)));
    $maxb = max(array_map('single', str_split($b)));
    if($maxa != $maxb) {
        return $maxa < $maxb;
    }
    if($base[$a[0]] != $base[$b[0]]) {
        return $base[$a[0]] < $base[$b[0]];
    }
    return compare(substr($a, 1), substr($b, 1));
}

$a = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
usort($a, compare);
print_r($a);

First we create a lookup array to assign a "magnitude" to single digit Roman numerals. Notice this isn't their decimal value, just numbers assigned in such a way that bigger numerals get bigger values. Then we create a helper function single used by some PHP functions to to retrieve the magnitudes.

OK, now to the meat of the algorithm. It is the compare function which sometimes has to call itself recursively when it needs to break a tie. For this reason, we start with some tests to see if it has reached terminal states in the recursion. Disregard that for now and look at the first interesting test. It checks to see if either numeral being compared has a digit in it that dwarfs any digits of the other. For instance, if one of them has X in it, and the other only has I and V, then the one with X wins. This relies on the convention that certain Roman numerals are not valid, like VV or VIIIII or IIIIIIIII. At least I have never seen them written that way, so I count them as invalid.

To make this check, we map the digits to magnitudes and compare maximums. Well, this test may not decide the issue. In that case it is safe to compare the first digits of each number, since we won't have to deal with confusing issues like V < IX where the first digits don't suggest the truth. These confusing situations were taken care of by comparing largest digits.

Finally, if the first digits are equal, strip them off and repeat. At some point one of the numerals will be reduced to an empty string, and those initial tests we were temporarily disregarding will take care of that.

This method has passed all the tests I threw at it, but let me know if you find a bug or optimizations.

Rottenstone answered 20/7, 2011 at 4:27 Comment(2)
+1 from me, I like the idea a lot. As the bounty is expiring, I will do several tests on all of the solutions posted very soon.Heterography
Some reading fun: articles.baltimoresun.com/1998-12-27/news/…Nearby
F
2

There would seem to be three approaches, namely:

  • Convert the numbers, sort using a standard integer sort, and convert back. (Or keep the converted versions with the roman numerals and sort the structures, to avoid the double conversion.)
  • Write a sort function that takes the strings, at that point calls a conversion function and does the appropriate comparison.
  • Write a sort function that can compare Roman numerals directly, without necessary involving a full conversion. Since Roman numerals have their higher components first, (Ms then D/Cs. then L/Xs, then I/Vs) such a function might be able to short circuit early.

The first will obviously involve additional overhead for storage. The second will involve additional conversion overhead (since the same number may be converted many times). The third might involve some unnecessary conversion overhead (again, the same number may be converted several times) but save some work on the short circuiting. If storage overheads are not an issue, the first is likely to be the best.

Footwork answered 28/6, 2011 at 14:6 Comment(1)
+1 Nice overview. If you have implementation ideas as well, you could share them.Heterography
H
2

I got quite interested in @borrible's 1st approach, so I decided I will give it a try:

function sortRomanArray($array) {
     $combined=array_combine($array, array_map('roman2int', $array));
     asort($combined);
     return array_keys($combined);
}

This basically converts all the Roman numerals in the array into integers using array_map() and a function called roman2int() (which can be any implementation). Then it creates an array where the keys are the Roman numerals and values are the integers. Then this array is sorted with asort() that preserves key associations, and the keys are returned as an array. This array will contain the sorted Roman numerals.

I like this method because it runs the conversion function only as much times as the size of the array (6 with my example array) and there is no need to convert back.

The conversion would run certainly much more if we put it in the comparison function (2 times for every comparison).

Heterography answered 28/6, 2011 at 15:39 Comment(2)
I do often forget it as well, but you might want to accept one of those answers instead of adding your own innit?Azotobacter
@Andresch Hm, I did not forget anything. I added this answer because it did not exist yet. I also offered a bounty if anyone could find a better solution.Heterography
I
1

I think you'll have to either:

  1. Wrap the strings into a RomanNumeral class, that has a sorting method OR
  2. Write a method to calculate the value of each element in the array, and sort on that
  3. See if someone has already written a RomanNumeral class/library that does this - something like this

Either way, you'll need custom sorting code that calculates the value somewhere. Since prefixing characters in Roman Numerals can sometimes mean "subtract this value" as opposed to "add this value". This is fine, because as you've pointed out, what you're really doing is sorting by numeric value, so you'll have to tell the computer how to interpret the value.

Ironstone answered 28/6, 2011 at 13:56 Comment(1)
You have to be careful about this, because only certain strings of letters are allowed. For example, DIV and MIX are ok, but MILD, CIVIC, and LIVID are all bogus. Also, if you are limited to ASCII, then only numbers below 4000 can be represented; you need Unicode for all the rest.Pious
J
1
  1. Convert the numeral to a decimal using this
  2. Compare the decimals

    function roman2dec($roman) {
        // see link above
    }
    
    function compare($a, $b) {
        return roman2dec($a) < $roman2dec($b) ? -1 : 1;
    }
    
Jarrodjarrow answered 28/6, 2011 at 13:57 Comment(3)
Do you realize that the question you linked to is from the same user?Reunion
@bazmegakapa - Thanks - I had just written it off-the-cuff. I've been writing too much groovy lately. :)Jarrodjarrow
-1: compare() should return 0 if the two values are equal. Unless you're worried about overflow due to very large positive and negative values in the same array, you can return roman2dec($a) - roman2dec($b); In any case, this converts the values multiple times which is wasteful unless memory is a huge concern.Livvi
E
0

The simplest solution is probably to first convert each numeral into a regular integer (in a new array), and then sort both arrays based on the integer array. Not sure if PHP contains a function for that, though. Alternatively, you can define a comparison function that converts two Roman numerals to integers and compares them. Writing a function that directly compares two Roman numerals without converting them to integers first will likely be cumbersome.

Eliezer answered 28/6, 2011 at 13:56 Comment(0)
L
0

Let's say you make this "alphabet": I, IV, V, IX, X, XL, L, XC, C, CD, D, CM, M. Then you could sort the Roman numbers according to this 'alphabet'.

Maybe this will give someone new inspiration.

EDIT: got a working example. Not really fast, sorts 1000 Roman numbers in 1.3 secs

EDIT 2: added a check to avoid the 'notices', also optimized the code a little, runs a little faster, and about twice as fast than with a conversion to integer and than sort that (used PEAR Number_Roman package)

function sortromans($a, $b){
    $alphabet = array('M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I');
    $pos = 0;
    if ($a == $b) {
        return 0;
    }

    //compare the strings, position by position, as long as they are equal
    while(isset($a[$pos]) && isset($b[$pos]) && $a[$pos] === $b[$pos]){
        $pos++;
    }

    //if string is shorter than $pos, return value
    if(!isset($a[$pos])){
        return -1;
    } else if(!isset($b[$pos])){
        return 1;
    } else {

      //check the ´character´ at position $pos, and pass the array index to a variable
      foreach($alphabet as $i=>$ch){
            if(isset($a_index) && isset($b_index)){
         break;
        }
        $length = strlen($ch);
        if(!isset($a_index) && substr($a, $pos, $length) === $ch){
            $a_index = $i;
        }
        if(!isset($b_index) && substr($b, $pos, $length) === $ch){
            $b_index = $i;
        }
      }

    }

    return ($a_index > $b_index) ? -1 : 1;
}

$romans = array('III', 'IX', 'I', 'CM', 'LXII','IV');

usort($romans, "sortromans");

echo "<pre>";
print_r($romans);
echo "</pre>";
Laticialaticiferous answered 14/7, 2011 at 18:12 Comment(2)
Sorry, fixed that. Also made the function a little faster.Laticialaticiferous
+1 to the solution of @hakre. Great solution! Could not comment beneath his solution (too little privileges) so therefore I placed this here.Laticialaticiferous
I
0

I think the best (see my comment) first solution is to use the standard usort PHP function with the help of a special roman compare function.

The following roman_compare function is very intuitive and do not use any kind of conversion. To keep it simple, it uses tail recursion.

function roman_start( $a )
{
    static $romans = array(
        'I'  => 1,    'V'  => 5,
        'X'  => 10,   'L'  => 50,
        'C'  => 100,  'D'  => 500,
        'M'  => 1000,
    );
    return $a[0] . ($romans[$a[0]] < $romans[$a[1]] ? $a[1] : '');
}

function roman_compare( $a, $b )
{
    static $romans = array(
        'I'  => 1,    'IV' => 4,   'V'  => 5,   'IX' => 9,
        'X'  => 10,   'XL' => 40,  'L'  => 50,  'XC' => 90,
        'C'  => 100,  'CD' => 400, 'D'  => 500, 'CM' => 900,
        'M'  => 1000,
    );
    $blockA = roman_start($a);
    $blockB = roman_start($b);
    if ($blockA != $blockB)
    {
        return $romans[$blockA] - $romans[$blockB];    
    }
    $compared = strlen($blockA);
    if (strlen($a) == $compared) //string ended
    {
        return 0;
    }
    return roman_compare(substr($a, $compared), substr($b, $compared));
}

Using the above functions, we can write

function array_equal( $a, $b )
{
    return count(array_diff_assoc($a, $b)) == 0 && count(array_diff_assoc($b, $a)) == 0;
}

$a        = array('XIX', 'LII', 'V', 'MCCXCIV', 'III', 'XIII');
$sorted_a = array('III', 'V', 'XIII', 'XIX', 'LII', 'MCCXCIV');

var_dump(array_equal($sorted_a, $a));
usort($a, 'roman_compare');
var_dump(array_equal($sorted_a, $a));

Running all the above code we get

bool(false)
bool(true)
Intermediary answered 17/7, 2011 at 18:8 Comment(1)
Actually, this does not scale well. With an array of 1000 random roman numbers (3000 to 3999), I get a sorting time of around 6 seconds, while with hakre's solution based on array_multisort, I get a sorting time of around 0.5 seconds.Intermediary

© 2022 - 2024 — McMap. All rights reserved.