Group rows of data, maintain a subarray of ids within the group, and only present the lowest id in each group as the first level key
Asked Answered
K

4

2

I'm need to merge an array of rows into groups and use the lowest id in each group as the first level key. Within each group, all encountered ids (excluding the lowest) should be gathered in a subarray called mergedWith.

Sample input:

[
    1649 => ["firstName" => "jack", "lastName" => "straw"],
    1650 => ["firstName" => "jack", "lastName" => "straw"],
    1651 => ["firstName" => "jack", "lastName" => "straw"],
    1652 => ["firstName" => "jack", "lastName" => "straw"],
]

My desired result:

[
    1649 => [
        "firstName" => "jack"
        "lastName" => "straw"
        "mergedWith" => [1650, 1651, 1652]
    ]
]

I have a loop running that can pull out duplicates and find the lowest ID in the group, but not sure of the right way to collapse them into one.

I've shown the desired results of a search that has identified id's with duplicate entries in those particular fields. I just want to further refine it to not delete, but add a field on the end of each group that says ["mergedWith" => [1650, 1651, 1652]]

Kokaras answered 20/9, 2018 at 20:27 Comment(7)
Show the code that you have and explain how the result differs from what you want.Peugia
does the array always have the same values but with different keys?Grappling
Create an associative array that uses the firstname/lastname as the keys. Loop through this array. Add the element to the associative array if the key doesn't exist, otherwise update the mergedWith column with this element's index.Sorcha
This might help you: #308174Consumer
You should really update your question to include the information you added in your latest comment. It's a totally different ask.Extensive
added to question, removed from commentsKokaras
"id" =>'1650' "id" =>'1651' "id" =>'1652' is an impossibility. Please provide a realistic minimal reproducible example.Sylvanite
N
4

One way to do it is to group by first name and last name, and then reverse the grouping to get a single id. krsort the input beforehand to make sure you get the lowest id.

krsort($input);

//group
foreach ($input as $id => $person) {
    // overwrite the id each time, but since the input is sorted by id in descending order,
    // the last one will be the lowest id
    $names[$person['lastName']][$person['firstName']] = $id;
}

// ungroup to get the result
foreach ($names as $lastName => $firstNames) {
    foreach ($firstNames as $firstName => $id) {
        $result[$id] = ['firstName' => $firstName, 'lastName' => $lastName];
    }
}

Edit: not too much different based on your updated question. Just keep all the ids instead of a single one.

krsort($input);

foreach ($input as $id => $person) {
    //                   append instead of overwrite ↓ 
    $names[$person['lastName']][$person['firstName']][] = $id;
}
foreach ($names as $lastName => $firstNames) {
    foreach ($firstNames as $firstName => $ids) {
        // $ids is already in descending order based on the initial krsort
        $id = array_pop($ids);  // removes the last (lowest) id and returns it
        $result[$id] = [
            'firstName' => $firstName,
            'lastName' => $lastName,
            'merged_with' => implode(',', $ids)
        ];
    }
}
Nabokov answered 20/9, 2018 at 20:43 Comment(5)
Sorry for last minute change, as I started trying some things out i realized I needed notations on the merged accounts, not just to hide or delete themKokaras
@Kokaras It's okay as far as my answer is concerned, with this approach the code isn't much different either way. But be careful about modifying your question too much after asking. You really should avoid invalidating existing answers.Cuman
Agreed, I have an issue where the tab key submits before I'm done, then I scramble to both try the answers and update all at once...Kokaras
@Kokaras if you accidentally submit a question before you're ready, you can always quickly delete, make your edits, then undelete.Cuman
I am working on my stack overflow etiquette, and appreciate the understandingKokaras
H
2
ksort($resArr);
$tempArr = array_unique($resArr, SORT_REGULAR);
foreach ($tempArr as $key => $value) {
    foreach ($resArr as $key1 => $value2) {
        if($value['firstName'] == $value2['firstName'] && $value['lastName'] == $value2['lastName']) {
            $tempArr[$key]["mergedWith"][] = $key1;
        }
    }
}
print_r($tempArr);

$resArr = array(1650 => array(
        "firstName" => "jack",
        "lastName" => "straw"
    ),1649 => array(
        "firstName" => "jack",
        "lastName" => "straw"
    )
    ,
    1651 => array(
        "firstName" => "jack",
        "lastName" => "straw"
    ),
    1652 => array(
        "firstName" => "jack",
        "lastName" => "straw"
    ),
    1653 => array(
        "firstName" => "jack1",
        "lastName" => "straw"
    ),
    1654 => array(
        "firstName" => "jack1",
        "lastName" => "straw"
));

Output
Array
(
    [1649] => Array
        (
            [firstName] => jack
            [lastName] => straw
            [mergedWith] => Array
                (
                    [0] => 1649
                    [1] => 1650
                    [2] => 1651
                    [3] => 1652
                )

        )

    [1653] => Array
        (
            [firstName] => jack1
            [lastName] => straw
            [mergedWith] => Array
                (
                    [0] => 1653
                    [1] => 1654
                )

        )

)
Hyatt answered 20/9, 2018 at 21:6 Comment(0)
S
0

@Don'tPanic's answer is using a preliminary loop to create a lookup array, then nested loops to form the desired result.

I recommend a simpler approach without nested loops. In the first loop, overpopulate the mergedWith element in each group -- this will be quite fast because there are no function calls and no conditions (aside from the null coalescing assignment operator, ??=). Then use a second loop to pull the first element from the mergedWith subarray -- this will apply the lowest id as the first level key and ensure that the first level key no longer exists in the group's subarray.

Code: (Demo)

ksort($array);
$temp = [];
foreach ($array as $key => $row) {
    $compositeKey = $row['firstName'] . '-' . $row['firstName'];
    $temp[$compositeKey] ??= $row;
    $temp[$compositeKey]['mergedWith'][] = $key;
}

$result = [];
foreach ($temp as $row) {
    $result[array_shift($row['mergedWith'])] = $row;
}
var_export($result);
Sylvanite answered 28/7, 2022 at 8:7 Comment(0)
S
0

Assuming your first level keys are always in ascending order like in your sample array (otherwise just call ksort() to apply ascending sorting based on the first level), use a single loop with a reference variable. If the identifying values are encountered a second time, push the key into the reference and remove the current row from the original array.

Code: (Demo)

foreach ($array as $key => &$row) {
    $compositeKey = $row['firstName'] . '-' . $row['firstName'];
    if (!isset($ref[$compositeKey])) {
        $ref[$compositeKey] = &$row;
    } else {
        $ref[$compositeKey]['mergedWith'][] = $key;
        unset($array[$key]);
    }
}
var_export($array);
Sylvanite answered 15/5, 2023 at 21:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.