MySQL FULLTEXT not working
Asked Answered
F

2

26

I'm attempting to add searching support for my PHP web app using MySQL's FULLTEXT indexes.

I created a test table (using the MyISAM type, with a single text field a) and entered some sample data. Now if I'm right the following query should return both those rows:

SELECT * FROM test WHERE MATCH(a) AGAINST('databases')

However it returns none. I've done a bit of research and I'm doing everything right as far as I can tell - the table is a MyISAM table, the FULLTEXT indexes are set. I've tried running the query from the prompt and from phpMyAdmin, with no luck. Am I missing something crucial?


UPDATE: Ok, while Cody's solution worked in my test case it doesn't seem to work on my actual table:

CREATE TABLE IF NOT EXISTS `uploads` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` text NOT NULL,
  `size` int(11) NOT NULL,
  `type` text NOT NULL,
  `alias` text NOT NULL,
  `md5sum` text NOT NULL,
  `uploaded` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;

And the data I'm using:

INSERT INTO `uploads` (`id`, `name`, `size`, `type`, `alias`, `md5sum`, `uploaded`) VALUES
(1, '04 Sickman.mp3', 5261182, 'audio/mp3', '1', 'df2eb6a360fbfa8e0c9893aadc2289de', '2009-07-14 16:08:02'),
(2, '07 Dirt.mp3', 5056435, 'audio/mp3', '2', 'edcb873a75c94b5d0368681e4bd9ca41', '2009-07-14 16:08:08'),
(3, 'header_bg2.png', 16765, 'image/png', '3', '5bc5cb5c45c7fa329dc881a8476a2af6', '2009-07-14 16:08:30'),
(4, 'page_top_right2.png', 5299, 'image/png', '4', '53ea39f826b7c7aeba11060c0d8f4e81', '2009-07-14 16:08:37'),
(5, 'todo.txt', 392, 'text/plain', '5', '7ee46db77d1b98b145c9a95444d8dc67', '2009-07-14 16:08:46');

The query I'm now running is:

SELECT * FROM `uploads` WHERE MATCH(name) AGAINST ('header' IN BOOLEAN MODE)

Which should return row 3, header_bg2.png. Instead I get another empty result set. My options for boolean searching are below:

mysql> show variables like 'ft_%';
+--------------------------+----------------+
| Variable_name            | Value          |
+--------------------------+----------------+
| ft_boolean_syntax        | + -><()~*:""&| |
| ft_max_word_len          | 84             |
| ft_min_word_len          | 4              |
| ft_query_expansion_limit | 20             |
| ft_stopword_file         | (built-in)     |
+--------------------------+----------------+
5 rows in set (0.02 sec)

"header" is within the word length restrictions and I doubt it's a stop word (I'm not sure how to get the list). Any ideas?

Fenske answered 14/7, 2009 at 14:16 Comment(1)
You can see stopwords list here dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html, for innodb dev.mysql.com/doc/refman/5.7/en/… or run SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;Ladew
F
51

Add more data. By default MySQL will ignore any word that is in 50% or more of the rows in the table as it considers it would be a 'noise' word.

With very few rows in a table, it is common to hit this 50% limit often (ie. if you have two rows, every word is in at least 50% of the rows!).

Flog answered 14/7, 2009 at 14:29 Comment(1)
Unless he uses boolean mode (MySQL4.1+) which ignores the 50 rule.Toponymy
H
25

There are two modes for MySQL Fulltext searching: natural language mode and Boolean mode. A restriction of natural language mode is " ... words that are present in 50% or more of the rows are considered common and do not match. Full-text searches are natural language searches if no modifier is given." And natural language is the default mode. This is documented in the Fulltext docs:

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

If you switch your query to using Boolean mode:

SELECT * FROM test WHERE MATCH(a) AGAINST('databases' IN BOOLEAN MODE)

Then the two rows are returned.

Boolean mode has its own restrictions, one common one being that it does not return its rows in order of relevance. Overall, it does offer more features and flexibility than natural language mode, so you'll probably end up using it.

If your application is going to rely heavily on fulltext searching you might want to consider more full-featured packages such as Lucene/Solr or Sphinx

However answered 14/7, 2009 at 14:28 Comment(2)
Thanks, I didn't realise about the 50% rule :)Fenske
This is a much better answer if you ask me.Toponymy

© 2022 - 2024 — McMap. All rights reserved.