Why does the cardinality of an index in MySQL remain unchanged when I add a new index?
Asked Answered
T

2

10

I have added a FULLTEXT index to one of my MySQL database tables as follows:

ALTER TABLE members ADD FULLTEXT(about,fname,lname,job_title);

The problem is that using phpmyadmin I can see the cardinality of my new index is only 1. Does this mean the index will never be used?

I have run a analyze table command but it didn't seem to do anything.

analyze table members

The respective types of the index fields are varchar(100), varchar(100), text, varchar(200) and the engine used is MyISAM and the table has about 30,000 rows, all unique. My MySQL version is 5.0.45.

Am I doing something wrong?

Tungus answered 16/4, 2009 at 10:43 Comment(0)
J
15

If you only have 1 row in the table, the cardinality for the index should be 1, of course. It's just counting the number of unique values.

If you think of an index as a lookup-table based on buckets (like a hash), then the cardinality is the number of buckets.

Here's how it works: When you build an index over a set of columns (a,b,c,d), then the database goes over all the rows in the table, looking at the ordered quadruplets of those 4 columns, for each row. Let's say your table looks like this:

a  b  c  d  e   
-- -- -- -- --  
1  1  1  1  200 
1  1  1  1  300
1  2  1  1  200
1  3  1  1  200

So what the database looks at is just the 4 columns (a,b,c,d):

a  b  c  d  
-- -- -- --
1  1  1  1 
1  2  1  1 
1  3  1  1 

See that there are only 3 unique rows left? Those will become our buckets, but we'll get back to that. In reality, there's also a record id, or row identifier for each row in the table. So our original table looks like this:

(row id) a  b  c  d  e   
-------- -- -- -- -- --  
00000001 1  1  1  1  200 
00000002 1  1  1  1  300
00000003 1  2  1  1  200
00000004 1  3  1  1  200

So when we look at only the 4 columns of (a,b,c,d), we're really looking also at the row id:

(row id) a  b  c  d 
-------- -- -- -- --
00000001 1  1  1  1
00000002 1  1  1  1
00000003 1  2  1  1
00000004 1  3  1  1

But we want to do lookup by (a,b,c,d) and not by row id, so we produce something like this:

(a,b,c,d) (row id)
--------- --------
1,1,1,1   00000001
1,1,1,1   00000002
1,2,1,1   00000003
1,3,1,1   00000004

And finally, we group all the row ids of rows that have identicle (a,b,c,d) values together:

(a,b,c,d) (row id)
--------- ---------------------
1,1,1,1   00000001 and 00000002
1,2,1,1   00000003
1,3,1,1   00000004

See that? The values of (a,b,c,d), which are (1,1,1,1) (1,2,1,1) and (1,3,1,1) have become keys for our lookup table into the rows of the original table.

Actually, none of this really happens, but it should give you a good idea on how a "naive" (i.e. straight-forward) implementation of an index might be done.

But the bottom line is this: cardinality just measures how many unique rows there are in an index. And in our example that was the number of keys in our lookup table, which was 3.

Hope that helps!

Jaban answered 16/4, 2009 at 10:43 Comment(5)
Thanks for the index info. Very well explained. The cardinality of my index should be more than 1 given that there are 30000 rows and almost every member has a different name?Tungus
Thanks for the explanation about indexes, it was very nice, but your explanation didn't answered the above asked question .Tallula
You're right, I didn't explicitly say the final conclusion: I just showed that 4 rows fall into 3 buckets. I'm sure you could invent another row that could be added that would fall into one of the existing 3 buckets of the index. That would leave the number of buckets unchanged, which would also mean the cardinality of the index was unchanged. Sorry about that.Butylene
Nice explanation. But 'Actually, none of this really happens' !? So what does really happen...? I feel a little bit confused.Troublemaker
@PaulLo Well, a database engine implemented in a naive way may actually do all of the detailed steps. But I expect that most implementations do this differently - making more efficient use of RAM and disk resource, to provide lower latency.Butylene
M
12

I cannot answer definitely why MySQL does not compute the cardinality, but I can guess. The MySQL manual states:

Cardinality: An estimate of the number of unique values in the index. This is updated by running ANALYZE TABLE or myisamchk -a. Cardinality is counted based on statistics stored as integers, so the value is not necessarily exact even for small tables. The higher the cardinality, the greater the chance that MySQL uses the index when doing joins.

FULLTEXT indices are only used in MATCH ... AGAINST (...) queries, which forces the index to be used. The MATCH ... AGAINST syntax does not work if there is not a FULLTEXT index on those fields.

My guess is that the cardinality isn't computed because it really isn't necessary.

Note that searches against the index work even though the cardinality isn't set.

For the record, an ANALYZE TABLE foobar statement seems to set the cardinality correctly.

Murraymurre answered 16/4, 2009 at 10:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.