Best solution to remove duplicate values from case-insensitive array [duplicate]
Asked Answered
C

5

5

I found a few solutions but I can't decide which one to use. What is the most compact and effective solution to use php's array_unique() function on a case-insensitive array?

Example:

$input = array('green', 'Green', 'blue', 'yellow', 'blue');
$result = array_unique($input);
print_r($result);

Result:

Array ( [0] => green [1] => Green [2] => blue [3] => yellow )

How do we remove the duplicate green? As far as which one to remove, we assume that duplicates with uppercase characters are correct.

e.g. keep PHP remove php

or keep PHP remove Php as PHP has more uppercase characters.

So the result will be

Array ( [0] => Green [1] => blue [2] => yellow )

Notice that the Green with uppercase has been preserved.

Colossian answered 5/6, 2011 at 1:53 Comment(0)
C
14

Would this work?

$r = array_intersect_key($input, array_unique(array_map('strtolower', $input)));

Doesn't care about the specific case to keep but does the job, you can also try to call asort($input); before the intersect to keep the capitalized values instead (demo at IDEOne.com).

Clinquant answered 5/6, 2011 at 2:3 Comment(7)
It does indeed work, very clean solution. I would be tempted to trim() the values as well, but that's up to OP's definition of duplicate.Gigigigli
It doesn't keep strings with most uppercase characters.Mudlark
@piotrm: I mentioned that in my answer...Clinquant
That was a reply to Wesley Murch's comment, you can't say it does the job, it may be a very clean solution, but to some other problem.Mudlark
@piotrm: The OP never asked for a solution that keeps the strings with the most uppercase letters, that's just your assumption.Clinquant
I actually think @Alix's answer is better than mine. It preserves the original input array's index positions (although mine would too if you replace the sort() with asort()), and is much clearer to read. Functionally, I think both are equivalent, though, once you make the sort->asort change.Kunstlied
Thanks Alix, your code is indeed the most compact and does the job.Colossian
K
3

If you can use PHP 5.3.0, here's a function that does what you're looking for:

<?php
function array_unique_case($array) {
    sort($array);
    $tmp = array();
    $callback = function ($a) use (&$tmp) {
        if (in_array(strtolower($a), $tmp))
            return false;
        $tmp[] = strtolower($a);
        return true;
    };
    return array_filter($array, $callback);
}

$input = array(
    'green', 'Green', 
    'php', 'Php', 'PHP', 
    'blue', 'yellow', 'blue'
);
print_r(array_unique_case($input));
?>

Output:

Array
(
    [0] => Green
    [1] => PHP
    [3] => blue
    [7] => yellow
)
Kunstlied answered 5/6, 2011 at 3:6 Comment(3)
Nice, but will fail on 'Green','gREEN' - will return 'Green', but 'gREEN' has more uppercase characters.Mudlark
@piotrm, yes it fails to return words with the most uppercase characters if the first character isn't uppercase. Incidentally I think that's good. Acronyms are all uppercase characters, so that's why I want to return the uppercase duplicate if it exists. otherwise if the first characater is lowercase and other uppercase its usually considered incorrect.Colossian
@Colossian - yeah, I kind of assumed the rule was not to necessarily choose which word had the most uppercase characters, but weight those that start with uppercase more heavily than those which don't.Kunstlied
M
1
function count_uc($str) {
  preg_match_all('/[A-Z]/', $str, $matches);
  return count($matches[0]);
}

$input = array(
    'green', 'Green', 'yelLOW', 
    'php', 'Php', 'PHP', 'gREEN', 
    'blue', 'yellow', 'bLue', 'GREen'
);

$input=array_unique($input);
$keys=array_flip($input);
array_multisort(array_map("strtolower",$input),array_map("count_uc",$input),$keys);
$keys=array_flip(array_change_key_case($keys));
$output=array_intersect_key($input,$keys);
print_r( $output );

will return:

Array
(
    [2] => yelLOW
    [5] => PHP
    [6] => gREEN
    [9] => bLue
)
Mudlark answered 5/6, 2011 at 3:52 Comment(0)
N
0

You should first make all values lowercase, then launch array_unique and you are done

Nock answered 5/6, 2011 at 1:56 Comment(1)
I edited my question for clarificationColossian
K
0

Normalize your data first by sending it through strtoupper() or strtolower() to make the case consistent. Then use your array_unique().

$normalized = array_map($input, 'strtolower');
$result = array_unique($normalized);
$result = array_map($result, 'ucwords');
print_r($result);
Karbala answered 5/6, 2011 at 1:56 Comment(2)
Wouldn't that make the result all lowercase or uppercase? I want to preserve the original value with an uppercase characterColossian
It would, but I added an example that takes care of that. calling ucwords() would uppercase the first letter.Karbala

© 2022 - 2024 — McMap. All rights reserved.