MySQL Recursive query to find the shortest path
Asked Answered
R

1

6

I have an issue I just can't get my head around...

I have a table called country_neighbour looking like this.

Country_name Country_id Neighbour_name Neighbour_id
Italy 1 France 2
Italy 1 Switzerland 6
Italy 1 Austria 5
France 2 Spain 3
France 2 Italy 1
France 2 Switzerland 6
Spain 3 France 2
Spain 3 Portugal 4
Portugal 4 Spain 3

What i want to get is the shortest path to get from one country to another lets say that I want to know how many borders is needed to cross to get to Portugal From Italy.

(Italy -> France) 1

(Italy -> France -> Spain -> Portugal) 3

I have been looking for ideas and find WITH cte a good approach to my problem but its not supported by MySQL

Is there anyone who can point me in the right direction. I appreciate all help I can get. Thanks.

Rollicking answered 11/7, 2015 at 18:40 Comment(3)
This is called a hierarchical query in the jargon of the trade. MySQL doesn't support hierarchical queries directly, but there are plenty of resources on the internet describing workarounds.Habitual
@olliejones it's not called recursion?Sissy
Yeah, it's called recursion too. But this OQ needed some hints on where else to search, eh?Habitual
E
0

This problem can be solved by first detecting the longest path, such that no same node occurs in the same path. After you get this numeric value, you can apply a sequence of several SELF LEFT JOIN operations by imposing two conditions:

  • country-neighbour match: the neighbour name of the table t1 is equal to the country name of the table t2
  • already seen node: the neighbour name of table t2 shall not be one of the previous country

Calculating the longest path will make you know how many LEFT JOIN you need to carry out to accomplish to this operation. Following there's a snippet given that the longest path is 4.

SELECT 
    r1.country_name         AS r1_start,
    r1.neighbour_name       AS r1_neigh,
    r2.neighbour_name       AS r2_neigh,
    r3.neighbour_name       AS r3_neigh,
    r4.neighbour_name       AS r4_neigh
FROM      routes r1
LEFT JOIN routes r2 
       ON     r2.country_name = r1.neighbour_name 
      AND     r2.neighbour_name NOT IN 
                (r1.country_name, r1.neighbour_name)
LEFT JOIN routes r3 
       ON     r3.country_name = r2.neighbour_name 
      AND     r3.neighbour_name NOT IN 
                (r1.country_name, r1.neighbour_name, r2.neighbour_name)
LEFT JOIN routes r4 
       ON     r4.country_name = r3.neighbour_name 
      AND     r4.neighbour_name NOT IN 
                (r1.country_name, r1.neighbour_name, r2.neighbour_name, r3.neighbour_name)

Once I run the query, I would store the result set inside a table (not recommended a view as this query can be quite expensive on big datasets, even though the conditions should release some weight from these joins): CREATE TABLE country_paths AS SELECT ...

Then, if I want to compute the paths for a specific country, I would query the generated table imposing a filter: SELECT * FROM country_paths WHERE r1.start = 'Italy'.

Here you can find a fiddle too: https://www.db-fiddle.com/f/tCkSHBH82RcFp1UMdXcfMW/0.

Side note: if the longest path is a really big number, then it would be advisable to use a prepared statement in combo with a cycle, like is done here.

Erk answered 12/4, 2022 at 19:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.