While grouping data into a 2d array, how does my code create indexed keys paired with the keys of the original array?

Asked 26/11, 2012 at 23:58 Answered 15/12, 2023 at 9:56

php arrays multidimensional-array filtering grouping

This actually relates to a previous question I asked where someone kindly offered some code I could work with as a solution to a problem I posed.

Now, I understand most of code that was posted, until I get to where array_filter() has been used to retrieve duplicate rows.

For reference, this is the previous question I posed; PHP: Searching through a CSV file the OOP way

This is where I'm having difficulty fully understanding what's going on:

public static function getRowsWithDuplicates($columnIndex) {
    $values = array();
    for ($i = 0; $i < $this->_dataSet->getRowCount(); ++$i) {
        $values[$this->_dataSet->->getValueAt($i, $columnIndex)][] = $i;
    }

    return array_filter($values, function($row) { return count($row) > 1; });
}

Firstly; looking at that code, it looks to me like $values is an associative array. If so, then how does array_filter() behave to return rows with duplicate values?

I don't think I fully understand the process of what's happening with array_filter() to understand this piece of code; I know that array_filter() passes each value from the array into the function provided in its parameters, and then returns a value from that function, but in this particular example I don't think I understand what's happening exactly inside array_filter().

I'd appreciate if someone could explain it to me at a level where I can understand the process step by step, so I can gain a better understanding of what's happening in the above code so I can go beyond just replicating the results.

~~If $values is an associative array, than how is the code taking advantage of the properties of associative array to return duplicate rows?~~

I guess it's not really the function itself I'm having difficulty understanding, but rather how its been used in this particular case.

return count($row) > 1;

I don't understand how the comparative operator behaves when returning a value; is it saying only return TRUE when there are more than 1 rows, and then array_filter() is evaluating that TRUE statement and returning the value associated with it?

What's being passed into $row in function($row)?

As you can see, I have a lot more questions than I do answers, and I would rather it were explained to me than I spend too long speculating and come to incorrect conclusions.

Is $row just the parameter that array_filter() uses to pass in the array value its iterating through?

Edit: This is my revised question, which is more specific and to the point of what I am looking for an answer on;

I understand what array_filter() is doing, but I do not understand how it is doing it. The original poster of the code, Jon, wrote; "This code will return an array where the keys are values in your CSV data and the values are arrays with the zero-based indexes of the rows where each value appears."

This is what I do not understand; How is it getting to the point that $values has values that are arrays with zero-based indexes of the rows where each value appears. How is the code finding the indexes of each row where every value appears, and putting that into arrays stored within $values?

So my question relates to this part of the code:

for ($i = 0; $i < $this->_dataSet->getRowCount(); ++$i) {
    $values[$this->_dataSet->->getValueAt($i, $columnIndex)][] = $i;

How does the code find the all the indexes for matching values, put those in an array, where the key of that array is the value that the indexes relate to?

It's like a whole search for duplicate values is going on that I can't see.

Transmute answered 26/11, 2012 at 23:58 Comment(0)

The callback in array_filter should return either true or false. If it returns true, then that means the current value should be kept. If it returns false, then the current value should be discarded.

Each value of the array is passed to the callback function as its first argument ($row in this case), then it's up to the callback to determine whether or not to keep the value.

Helpmeet answered 27/11, 2012 at 0:1 Comment(5)

This much I understand of the process. It's more to do with how the function is being used in the example above to retrieve rows with duplicate values; I'm not entirely sure how it's doing it in the example I posted. – Transmute 27/11, 2012 at 0:8

Well, since the function is intended to find the rows that have duplicates, then it needs to filter to find the ones that have more than one in the count. – Helpmeet 27/11, 2012 at 1:7

I can see that it's doing that, but I have no idea how it's doing it. It's the process I'm trying to understand. How is it checking for the duplicate values? From looking at the code I'm not seeing the procedural process from array -> to search -> to find duplicate values. How is array_filter searching the array? It can't just be iterating through it like a loop. – Transmute 27/11, 2012 at 1:32

That's exactly what array_filter does. It iterates through your source array, and builds a new array based on what the callback returns. – Helpmeet 27/11, 2012 at 1:46

I understand what array_filter is doing, but I do not understand how it is doing it. The original poster of the code, Jon, wrote; "This code will return an array where the keys are values in your CSV data and the values are arrays with the zero-based indexes of the rows where each value appears." This is what I do not understand; How is it getting to the point that $values has values that are arrays with zero-based indexes of the rows where each value appears. How is the code finding the indexes of each row where every value appears, and putting that into arrays stored within $values? – Transmute 27/11, 2012 at 11:52

Allow me to offer a simpler demonstration by removing all of the spreadsheet methods and declaring a simple 2d array to access.

Input: (Demo)

$lookup = [
    ['a', 'ape'],
    ['b', 'bee'],
    ['a', 'ant'],
    ['d', 'dog'],
    ['b', 'bat'],
    ['a', 'asp'],
];

function getValueAt($lookup, $rowIndex, $columnIndex) {
    return $lookup[$rowIndex][$columnIndex];
}

$columnIndex = 0;

The for loop assumes that the input data that you intend to iterate over is indexed (a gap-less structure where numeric keyed starting from 0 can be safe used).

$values is the newly declared array where the new data will be collected.
The [getValueAt($lookup, $i, $columnIndex)] part dictates that the first level keys of the new array will be determined by the function's return value. This technique will serve to identify and group related data.
The [] that follows is syntactic sugar which "pushes" data as a child of the accessed element -- so if $values[getValueAt($lookup, $i, $columnIndex)] hadn't been encountered before a new element would be pushed into the first position of the subarray: $values[getValueAt($lookup, $i, $columnIndex)][0]. The next time that that same return value from getValueAt() is received, the generated key will be [1] in the subarray.
The $i which is declared in the for() loop signature is used as the value pushed into the result array. This means that grouped subarrays will not lose track of where the data originally came from.

Processing code:

$values = [];
for ($i = 0; $i < count($lookup); ++$i) {
    $values[getValueAt($lookup, $i, $columnIndex)][] = $i;
}

So var_export($values); will output:

array (
  'a' => 
  array (
    0 => 0, // ape
    1 => 2, // ant
    2 => 5, // asp
  ),
  'b' => 
  array (
    0 => 1, // bee
    1 => 4, // bat
  ),
  'd' => 
  array (
    0 => 3, // dog
  ),
)

And var_export(array_filter($values, fn($row) => count($row) > 1)); will output:

array (
  'a' => 
  array (
    0 => 0, // ape
    1 => 2, // ant
    2 => 5, // asp
  ),
  'b' => 
  array (
    0 => 1, // bee
    1 => 4, // bat
  ),
)

_{Honestly though, if you wanted to group the data, preserve the original index, AND store all relevant data in the subsets, it could have been: $values[getValueAt($lookup, $i, $columnIndex)][$i] = getAllTheData(); because there will never be a data collision on $i.}

Conquian answered 15/12, 2023 at 9:56 Comment(0)

Recommended topics

Hot tags