The following solutions need a table containing sequence numbers from 1 to (at least) the length of your word_column
. Assuming the word_column
is VARCHAR(190)
you need a table with numbers from 1 to 190. If you use MariaDB with the sequence plugin, you can use the table seq_1_to_190
. If you don't have it, there are many ways to create it. One simple way is to use the information_schema.columns
table:
create table if not exists seq_1_to_190 (seq tinyint unsigned auto_increment primary key)
select null as seq from information_schema.columns limit 190;
You can also create it on-the-fly in a subquery, but that would complicate your queries.
I will use the session variable @word
to store the search string.
set @word = 'StackExch_bla_bla_bla';
But you can replace all its occurrences with the constant search string.
Now we can use the sequence table to create all prefix substrings with
select seq as l, left(@word, seq) as substr
from seq_1_to_190 s
where s.seq <= char_length(@word)
http://rextester.com/BWU18001
and use it for the LIKE
condition when you join it with your words
table:
select w.word_column
from (
select seq as l, left(@word, seq) as substr
from seq_1_to_190 s
where s.seq <= char_length(@word)
) s
join words w on w.word_column like concat(replace(s.substr, '_', '\_'), '%')
order by s.l desc
limit 1
http://rextester.com/STQP82942
Note that _
is a placeholder and you need to escape it in your search string with \_
. You also need to do that for %
if your string can contain it, but I will skip this part in my answer.
The query can also be written without the subquery:
select w.word_column
from seq_1_to_190 s
join words w on w.word_column like concat(replace(left(@word, seq), '_', '\_'), '%')
where s.seq <= char_length(@word)
order by s.seq desc
limit 1
http://rextester.com/QVZI59071
These queries do the job and in theorie they should also be fast. But MySQL (In my case MariaDB 10.0.19) creates a bad execution plan and doesn't use the index for the ORDER BY
clause. Both queries run in about 1.8 seconds on a 100K rows data set.
Best I could do to improve the performance with a single query is
select (
select word_column
from words w
where w.word_column like concat(replace(left(@word, s.seq), '_', '\_'), '%')
limit 1
) as word_column
from seq_1_to_190 s
where s.seq <= char_length(@word)
having word_column is not null
order by s.seq desc
limit 1
http://rextester.com/APZHA8471
This one is faster, but still needs like 670 msec. Note that Gordons CASE query runs in 125 msec, though it needs a full table/index scan and filesort.
However I managed to force the engine to use the index for the ORDER BY
clause with an indexed temporary table:
drop temporary table if exists tmp;
create temporary table tmp(
id tinyint unsigned auto_increment primary key,
pattern varchar(190)
) engine=memory
select null as id, left(@word, seq) as pattern
from seq_1_to_190 s
where s.seq <= char_length(@word)
order by s.seq desc;
select w.word_column
from tmp force index for order by (primary)
join words w
on w.word_column >= tmp.pattern
and w.word_column < concat(tmp.pattern, char(127))
order by tmp.id asc
limit 1
http://rextester.com/OOE82089
This query is "instant" (less than 1 msec) on my 100K rows test table. If I remove FORCE INDEX
or use a LIKE
condition, it will be slow again.
Note that char(127)
seems to work for ASCII strings. You might need to find another character according to your character set.
After all that, I must say that my first thought was to use a UNION ALL
query, which was also suggested by Gordon Linoff. However - here is a SQL only solution:
set @subquery = '(
select word_column
from words
where word_column like {pattern}
limit 1
)';
set session group_concat_max_len = 1000000;
set @sql = (
select group_concat(
replace(
@subquery,
'{pattern}',
replace(quote(concat(left(@word, seq), '%')), '_', '\_')
)
order by s.seq desc
separator ' union all '
)
from seq_1_to_190 s
where s.seq <= char_length(@word)
);
set @sql = concat(@sql, ' limit 1');
prepare stmt from @sql;
execute stmt;
http://rextester.com/OPTJ37873
It is also "instant".
If you like strored procedures/functions - Here's a function:
create function get_with_similar_begin(search_str text) returns text
begin
declare l integer;
declare res text;
declare pattern text;
set l = char_length(search_str);
while l > 0 and res is null do
set pattern = left(search_str, l);
set pattern = replace(pattern, '_', '\_');
set pattern = replace(pattern, '%', '\%');
set pattern = concat(pattern, '%');
set res = (select word_column from words where word_column like pattern);
set l = l - 1;
end while;
return res;
end
Use it as
select get_with_similar_begin('StackExch_bla_bla_bla');
select get_with_similar_begin('StackO_bla_bla_bla');
http://rextester.com/CJTU4629
It is probably the fastest way. Though for long strings a kind of divide and conquer algorinthm might decrease the average number of lookups. But might also be just overkill.
If you want to test your queries on a big table - I used the following code to create my test table (for MariaDB with sequence plugin):
drop table if exists words;
create table words(
id mediumint auto_increment primary key,
word_column varchar(190),
index(word_column)
);
insert into words(word_column)
select concat('Stack', rand(1)) as word_column
from seq_1_to_100000;
insert into words(word_column)values('StackOferflow'),('StackExchange'),('MetaStackExchange');
like
andregexp
are not suitable because I want to get a word that begins the same way as the input word. – Lorenalorene