Filter array to retain rows with smallest element count and unique first and last elements

Asked 21/2, 2012 at 20:15 Answered 22/1, 2022 at 6:42

Solved php arrays comparison filtering grouping

I want to remove rows from my array so that my result is an array that contains rows with unique first and last elements. If two (or more) rows have the same first and last value, I want to preserve the row with the lowest element count.

Say I have the following array:

$var = [
    [1, 2, 3],
    [1, 3],
    [1, 2, 4, 3],
    [1, 3, 4]
];

What I want is to remove all arrays from $var that have the first and last element the same as another array from $var but have more elements.

Because the first three rows all start with 1 and end with 3, only the second row containing [1, 3] should be kept.

The fourth row ([1, 3, 4]) uniquely starts with 1 and ends with 4, so it should also be kept.

The output should be:

[
    [1, 3],
    [1, 3, 4]
]

I am looking for the most efficient way of doing this, both in terms of memory and time. $var may have up to 100 arrays, and each individual array may have up to 10 elements in it. I thought of using some kind of comparison between all two elements (for(i=0;...) for(j=i+1;...) complexCompareFunction();), but I believe this isn't very efficient.

Atahualpa answered 21/2, 2012 at 20:15 Comment(6)

Do you have a use case for what you're trying to do? There may be a better way to implement it... – Weissmann 21/2, 2012 at 20:23

I am generating all combinations of public transportation lines which link two user-picked locations. From the array of generated line combinations ($var in this case), I would like to delete those that use an extra line. So, if one can reach the second point with lines (1, 3), why should (1, 2, 3) be displayed also? Here, line 2 is extra, one can reach the destination without it. – Atahualpa 21/2, 2012 at 20:27

maybe to travel 1,2,3 will take less time (because you go by train) than 1,3 (because you can only go by bus directly). we could reconstruct the problem: have a graph with stations as nodes and connections as edges. for every two nodes n,m you now search the shortest path that starts with n and ends with m. there is tons of references out there on this topic. – Ablate 21/2, 2012 at 20:41

Well, if the only way to reach the destination is by line 3, why should you bother taking line 2 since you anyway have to wait for line 3? You only change lines and pay for an extra ticket to reach the same destination. Or am I missing something? This task is just a small part of the whole algorithm, which cuts out any generated route that is similar to another one or has some extra lines. I didn't use shortest paths because I will "measure" my paths by travel times that are dependent on when a user queries my app. So a path shorter in distance than another one isn't necessarily better. – Atahualpa 21/2, 2012 at 20:50

Using Djikstra Best First Path or A* or similar algorithms. – Arrester 21/2, 2012 at 20:52

i misunderstood you then. the numbers are actually your lines (edges). i thought of them as stations. – Ablate 21/2, 2012 at 20:54

In general, yes, you are too worried about efficiency (as you wondered in another comment). Though PHP is not the most blisteringly-fast language, I would suggest building the most straightforward solution, and only worry about optimizing it or streamlining it if there is a noticeable issue with the end result.

Here is what I would do, off the top of my head. It is based off of ajreal's answer but hopefully will be easier to follow, and catch some edge cases which that answer missed:

// Assume $var is the array specified in your question

function removeRedundantRoutes( $var ){

    // This line sorts $var by the length of each route
    usort( $var, function( $x, $y ){ return count( $x ) - count( $y ); } );

    // Create an empty array to store the result in
    $results = array();

    // Check each member of $var
    foreach( $var as $route ){
        $first = $route[0];
        $last = $route[ count( $route ) - 1 ];
        if( !array_key_exists( "$first-$last", $results ) ){
            // If we have not seen a route with this pair of endpoints already,
            // it must be the shortest such route, so place it in the results array
            $results[ "$first-$last" ] = $route;
        }
    }

    // Strictly speaking this call to array_values is unnecessary, but
    // it would eliminate the unusual indexes from the result array
    return array_values( $results );
}

Scarabaeus answered 21/2, 2012 at 21:22 Comment(1)

With a few tweaks, I got this working. Thank you for your help and for the details in your answer! – Atahualpa 24/2, 2012 at 18:29

use current and end

$all = array();
foreach ($var as $idx=>$arr):
  $first = current($arr);
  $last  = end($arr);
  $size  = count($arr);
  $key   = $first.'.'.$last;
  if (isset($all[$key])):
    if ($size > $all[$key]):
      unset($var[$idx]);
    else:
      $all[$key] = $size;
    endif;
  else:
    $all[$key] = $size;
  endif;
endforeach;

ops ... you can iterate (again) at the end to ensure the already reduced sized array can be further removed

Stefanistefania answered 21/2, 2012 at 20:26 Comment(4)

In this example, (1, 2, 3) still exists in the array. – Chemarin 21/2, 2012 at 20:29

The code works great, but only if arrays inside $var are sorted by number of elements. I believe that it wouldn't be very efficient to first sort $var by arrays' number of elements, and then execute the code you posted. Or am I too worried about efficiency? – Atahualpa 21/2, 2012 at 20:43

Store $idx for smallest size in $all or another array and add unset for this $idx inside innermost else. – Headlock 21/2, 2012 at 20:49

I've managed to get this working as I expected, thank you for your answer! Unfortunately, it's very hard to decide which answer to mark as best, especially under these circumstances. My choice was made regarding the details from the answer, not at all by the answer's quality. – Atahualpa 24/2, 2012 at 18:32

Here is what I would do, off the top of my head. It is based off of ajreal's answer but hopefully will be easier to follow, and catch some edge cases which that answer missed:

// Assume $var is the array specified in your question

function removeRedundantRoutes( $var ){

    // This line sorts $var by the length of each route
    usort( $var, function( $x, $y ){ return count( $x ) - count( $y ); } );

    // Create an empty array to store the result in
    $results = array();

    // Check each member of $var
    foreach( $var as $route ){
        $first = $route[0];
        $last = $route[ count( $route ) - 1 ];
        if( !array_key_exists( "$first-$last", $results ) ){
            // If we have not seen a route with this pair of endpoints already,
            // it must be the shortest such route, so place it in the results array
            $results[ "$first-$last" ] = $route;
        }
    }

    // Strictly speaking this call to array_values is unnecessary, but
    // it would eliminate the unusual indexes from the result array
    return array_values( $results );
}

Scarabaeus answered 21/2, 2012 at 21:22 Comment(1)

With a few tweaks, I got this working. Thank you for your help and for the details in your answer! – Atahualpa 24/2, 2012 at 18:29

Here is how I would group by a temporary key (formed by creating a delimited string from the first and last value in a given row) and conditionally push qualifying data into a result array. When the loop finishes, extract the second column from the result array to produce an indexed array containing only the smallest of qualifying rows. No pre-sorting required.

Code: (Demo)

$result = [];
foreach ($array as $row) {
    $cache = [count($row), $row];
    array_splice($row, 1, -1);
    $key = implode('-', $row);
    if (!isset($result[$key]) || $cache[0] < $result[$key][0]) {
        $result[$key] = $cache;
    }
}
var_export(array_column($result, 1));

Alternative Code: (Demo)

$result = [];
foreach ($array as $row) {
    $count = count($row);
    $key = $row[0] . '-' . $row[array_key_last($row)];  // or array_pop($row)
    if (!isset($result[$key]) || $count < $result[$key][0]) {
        $result[$key] = [$count, $row];
    }
}
var_export(array_column($result, 1));

Output:

array (
  0 => 
  array (
    0 => 1,
    1 => 3,
  ),
  1 => 
  array (
    0 => 1,
    1 => 3,
    2 => 4,
  ),
)

Cuffs answered 22/1, 2022 at 6:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags