PHP - returning closest match from database
Asked Answered
U

5

6

I'm returning mobile phone data based on the user agent. But in an instance where the useragent is not stored (newer version of phone or software) I want to be able to return the closest match, a bit like how Google displays the "did you mean this". i.e.

if I have a stored useragent of

Mozilla/5.0 (Linux; U; Android 2.1-update1; en-nl; Desire_A8181 Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17

and the user agent in use is

Mozilla/5.0 (Linux; U; Android 2.1-update1; en-nl; Desire_G45H Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17

I want to be able to return the one stored to add or adapt my data accordingly.

Does anyone know a way of achieving this?

Universality answered 8/7, 2011 at 9:17 Comment(2)
If you want to develop your own system that fine by me, but.... Did you take a look at docs.php.net/function.get_browser and the usually faster-than-light updated browscap.ini at browsers.garykeith.com/downloads.asp ?Singlehandedly
If you only want to take care of this on database level, you should remove the php tag.Roadbed
P
4

use full text searching with most relevant data...

SELECT * MATCH(browser) AGAINST ('your browser') AS score order by score DESC
Purposeless answered 8/7, 2011 at 9:24 Comment(3)
interesting, would have to use a limit though right as it would still return all results just in order? Will defiantly look int this.Universality
Be aware, however, that the MySQL fulltext index is a beast with some problems - for example, minimum word length is 4 chars: in "530.17", it would automatically always ignore the ".17" part. Also, the closest match criteria is a little flawed. It works alright for natural texts, but maybe not for user agents.Cud
you can change the minimum word length and disable stopwords, which are an even bigger pain. see dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.htmlRissole
C
2

The usual approach for fuzzy string matching are things like calculating the levenshtein distance or implementing it as an n-gram search index. But for matching user agents, this is overkill.

Rather reduce the string you search for to certain important criterias, then do something like

SELECT * FROM agents WHERE agent LIKE "Mozilla/5.0 (Linux; U; Android%) AppleWebKit/5% Version/4.0 Mobile Safari/5%"

So, you strip out certain too detailed parts and replace them by % in your LIKE statement. You should, however, reconsider the architecture - I would only save the important parts and leave out the exact build number etc. Also consider using an external library that already contains user agents and does the matching for you, no need to reinvent the wheel.

EDIT: just as VolkerK pointed out above, the "external library" should be PHPs getbrowser. Just added for compeleteness of the answer ;-)

Cud answered 8/7, 2011 at 9:23 Comment(1)
I have a database of 6,549 mobile devices all with multiple user agents against them, I'm not re-inventing the wheel, just smoothing off the edges. Thanks for the above method, will give it a try. RegardsUniversality
R
0

Take a look at one of the scripts on Highlight the difference between two strings in PHP, with some modifications you should be able to get a percentage of the difference.

Roadbed answered 8/7, 2011 at 9:21 Comment(1)
That's two strings, I could write my own function for that or use levenshtein() but this is a matter of sql efficiency and was wondering if there was a sql equivalentUniversality
T
0

Use full-text search; otherwise these functions may help:

Thornberry answered 8/7, 2011 at 9:28 Comment(1)
again, I don't want to be selecting every user agent string from my db and comparing, this is not efficient.Universality
P
0

To get the best result, you can extract the whole records and do a cycle to guess who's the most similar word

check these functions

SIMILAR_TEXT

SOUNDEX

LEVENSHTEIN

I.E.

$most_similar = "";
$highestPercentage = 0;
foreach ($ua in $all_user_agents) {
    $i = similar_text($current_user_agent, $ua, &$p);
    if ($p > $highestPercentage) {
        $highestPercentage = $p;
        $most_similar = $ua;
    }
}

echo "most similar = $most_similar"

Anyway, you can use the mySQL Fulltext-search following some guidelines like this one

Peppers answered 8/7, 2011 at 9:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.