SQL filtered indexes: should I always put a filter on an index for optional columns?
Asked Answered
B

2

10

For 'large' tables, is there any reason not to put a filter on indexes for optional columns?

So for an index on column AAA (because people can search on AAA),
I can set the filter to ([AAA] IS NOT NULL).
This saves storage, so it saves money.

Some more advantages from technet:

  • Improved query performance and plan quality
  • Reduced index maintenance costs
  • Reduced index storage costs

People say that it's good to put a filter on an index for columns that are mostly empty. But why wouldn't I put a filter on indexes for columns that are empty for like 1%? Is there any reason not to do it if it only has advantages?

Bank answered 11/5, 2012 at 8:58 Comment(0)
D
5

This is usually a good idea with two gotchas:

  1. The table designer has a bug (only pre Denali!). When it rebuilds a table it deletes all filters.
  2. Be sure that the optimizer can tell statically that your predicate will never allow null rows to be returned. Usually, this is the case because of SQL NULL semantics (semmingly the only case where they help instead of hinder). Example: select distinct col from T will not use the index because a null value might be found. Use this: select distinct col from T where col is not null.

Filtered indexes are vastly underused. They can even be used to make a nullable column unique.

My practical recommendation: Just try it for a few month and learn for yourself if there are additional unforseen problems.

If you are into advanced SQL Server query techniques, also look ad indexed views. THey are a super set of filtered indexes (at least on Enterprise).

Dorty answered 11/5, 2012 at 10:0 Comment(0)
K
0

All indexes have advantages and disadvantages: Disadvantages:

  1. they take up disk space
  2. they need to be maintained (the balance of the index tree needs to be reorgansised periodically to ensure any query optimisation is not using bum data distribution) which may mean they need to be taken off line -- bad news if they are busy
  3. they need time to update on the fly if there are frequent inserts

Advantages:

  1. Properly designed, they can eliminate expensive table scans
  2. Properly designed, (a covering index) they can elimiate any table read.

So as ususual it depends.

  1. Too many indexes can dramatically slow write performanace
  2. Too many indexes can dramatically increase dispace usage
  3. Not the right index can dramatically decrease read performance

Some people make a very good living out of really knowing their stuff about indexes: There is immensely good stuff here http://www.insidesqlserver.com/

So it depends how often users are returning data referenced by the index vs how often they are updating the data contained via the index.

Indexes for sparse columns are no different, however where the column is (largely) empty then filtered indexes are more efficient. Once the sparesness decreases (eg 50/50) then the distribution of the data can become very important when the optimiser decides on the best plan to return the data. A filtered index will not know the distribution for data outside the filter - bit obvious but needs to be said.

Konya answered 11/5, 2012 at 9:49 Comment(4)
I think you missed the point of the question. Its not about indexes in general, its about filters on indexes. It takes away your disadvantage of taking up disk space etc..Bank
Sorry, I was trying to make the point that when considering generally to have an index, the overall considerations start at the same place. Data density, reads vs writes etc. The result should be index / no index and if index then index type. You can also use the missing / unused index stored procs to tune performance as time goes on and data distribution changes. A filtered index takes up more space than no index, less space than a non filtered index. Not trying to start a war!Konya
So now for my question: You say "where the column is largely empty then filtered indexes are more efficient"--> Why wouldn't I put a filter on an index for a column that is only empty for 5%, or even like 1%? (it still could be like 500000 rows, so it does save storage.)Bank
@ErikDekker - I think the last paragraph in the answer covers one reason. The statistics created along with the index use the same filter, so you can lose some of the density information for the table, which means the optimizer could make non-optimal decisions in some circumstances. However, I imagine you could use a CREATE STATISTICS statement to get the best of both worlds...Zephaniah

© 2022 - 2024 — McMap. All rights reserved.