The MySQL 5.7 documentation states:
The
filtered
column indicates an estimated percentage of table rows that will be filtered by the table condition. That is,rows
shows the estimated number of rows examined androws × filtered / 100
shows the number of rows that will be joined with previous tables.
To attempt to understand this better, I tried it out on a query using the MySQL Sakila Sample Database. The table in question has the following structure:
mysql> SHOW CREATE TABLE film \G
*************************** 1. row ***************************
Table: film
Create Table: CREATE TABLE `film` (
`film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`description` text,
`release_year` year(4) DEFAULT NULL,
`language_id` tinyint(3) unsigned NOT NULL,
`original_language_id` tinyint(3) unsigned DEFAULT NULL,
`rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3',
`rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99',
`length` smallint(5) unsigned DEFAULT NULL,
`replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99',
`rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G',
`special_features` set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`film_id`),
KEY `idx_title` (`title`),
KEY `idx_fk_language_id` (`language_id`),
KEY `idx_fk_original_language_id` (`original_language_id`),
CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE,
CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8
And this is the EXPLAIN
plan for the query:
mysql> EXPLAIN SELECT * FROM film WHERE release_year=2006 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1000
filtered: 10.00
Extra: Using where
This table's sample dataset has 1,000 total rows, and all of them have release_year
set to 2006. Using the formula in the MySQL documentation:
rows x filtered / 100
= "number of rows that will be joined with previous tables
So,
1,000 x 10 / 100 = 100
= "100 rows will be joined with previous tables"
Huh? What "previous table"? There is no JOIN
going on here.
What about the first portion of the quote from the documentation? "Estimated percentage of table rows that will be filtered by the table condition." Well, the table condition is release_year = 2006
, and all records have that value, so shouldn't filtered
be either 0.00
or 100.00
(depending on what they mean by "filtered")?
Maybe it's behaving strangely because there's no index on release_year
? So I created one:
mysql> CREATE INDEX test ON film(release_year);
The filtered
column now shows 100.00
. So, shouldn't it have shown 0.00
before I added the index? Hm. What if I make half the table have release_year
be 2006, and the other half not?
mysql> UPDATE film SET release_year=2017 ORDER BY RAND() LIMIT 500;
Query OK, 500 rows affected (0.03 sec)
Rows matched: 500 Changed: 500 Warnings: 0
Now the EXPLAIN
looks like this:
mysql> EXPLAIN SELECT * FROM film WHERE release_year=2006 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: ref
possible_keys: test
key: test
key_len: 2
ref: const
rows: 500
filtered: 100.00
Extra: Using index condition
And, since I decided to confuse myself even further:
mysql> EXPLAIN SELECT * FROM film WHERE release_year!=2006 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: ALL
possible_keys: test
key: NULL
key_len: NULL
ref: NULL
rows: 1000
filtered: 50.10
Extra: Using where
So, an estimate of 501 rows will be filtered by the table condition and "joined with previous tables"?
I simply do not understand.
I realize it's an "estimate", but on what is this estimate based? If an index being present moves the estimate to 100.00
, shouldn't its absence be 0.00
, not 10.00
? And what's with that 50.10
result in the last query?
Is filtered
at all useful in determining if a query can be optimized further, or how to optimize it further, or is it generally just "noise" that can be ignored?
previous
and almost all posts I've seen used this to describefiltered
which obviously has a lot of ambiguity. – Gracegraceful