How to use fulltext matching for emails in MySQL?
Asked Answered
N

2

-1

I'm adding a "search by email" functionality on my contacts database table (MySQL, InnoDB engine) for our CRM system. I added the FULLTEXT index and partial searches like this

WHERE MATCH(cc.email_c) AGAINST('Joe' IN NATURAL LANGUAGE MODE)

are working fine. However, if I try to match against a full email address like this

WHERE MATCH(cc.email_c) AGAINST('[email protected]' IN NATURAL LANGUAGE MODE)

all email addresses with Gmail get returned. I've already discarded the possibility of using BOOLEAN MODE, from what I understand it doesn't support the "@" symbol in a query. Is there any other way of using a MATCH AGAINST while still being able to search by the exact full address?

I can always use an SQL with LIKE and some wildcards, but I would still prefer to have a full-text search.

Nittygritty answered 24/3, 2020 at 12:57 Comment(2)
Long story short: combine full text search with LIKE, using OR. If you want to know why read the documentation how fulltext search works in general.Otway
@emix - Not OR.Durante
L
0

For emails like [email protected] the character @ , is a reserved operator for FullText Search. Instead of the NATURAL LANGUAGE mode, you should try to stick to the BOOLEAN MODE as it supports wildcards(*) which infact is more "searchy".

You could skip all the punctuation marks in the email address and construct a query like this,

For searching the [email protected]:

SELECT *, 
       MATCH(email) AGAINST ('+reymysterio* +619* +gmail* +com*' IN BOOLEAN MODE) AS relevance,
       email LIKE '[email protected]%' AS exact_match_relevance
FROM wrestlers
WHERE MATCH(email) AGAINST ('+reymysterio* +619* +gmail* +com*' IN BOOLEAN MODE) 
ORDER BY exact_match_relevance DESC, relevance DESC;

In that way you not only retrieve a subset of results but also order by relevance using LIKE.

Things to keep in mind:

  1. "com" is a default stopword(InnoDB), you need to build the FULLTEXT index by modifying the Stopword list.

  2. Edge cases: Emails like "[email protected]" are valid. By Default FULLTEXT uses innodb_ft_min_token_size=3 (InnoDB) this is also same for the MyISAM . You should modify the token size to 1. Other wise when the parser splits the email over punctuations a b xy com for tokenization, the tokens a , b ,xy are Lost.

Lentic answered 23/4 at 17:32 Comment(0)
D
-1

Convert it to AGAINST('+Joe +gmail +com' IN BOOLEAN MODE)

But, beware; short names and "stop words" should not have a '+' on front of them.

Then, because it won't be precise enough, combine with a LIKE:

WHERE MATCH(email) AGAINST('+Joe +gmail +com' IN BOOLEAN MODE)
  AND email LIKE '%[email protected]%'

The first part gives you speed; the second part will test only the ones that pass the first test.

Durante answered 1/4, 2020 at 18:43 Comment(1)
Is there a flaw that led to the downvote? Maybe I can fix it.Durante

© 2022 - 2024 — McMap. All rights reserved.