I'm seeing some strange performance issues on SQL Server 2008 with a nullable geography column with a spatial index. Each null value is stored as a root node within the spatial index.
E.g. A table with 5 000 000 addresses where 4 000 000 has a coordinate stored.
Every time I query the index I have to scan through every root node, meaning I have to scan through 1 000 001 level 0 nodes. (1 root node for all the valid coordinates + 1M nulls)
I cannot find this mentioned in the documentation, and I cannot see why SQL allows this column to be nullable if the indexing is unable to handle it.
For now I have bypassed this by storing only the existing coordinates in a separate table, but I would like to know what is the best practice here?
EDIT: (case closed)
I got some help on the sql spatial msdn forum, and there is a blog post about this issue:
http://www.sqlskills.com/BLOGS/BOBB/post/Be-careful-with-EMPTYNULL-values-and-spatial-indexes.aspx
Also the MSDN documentation does infact mention this, but in a very sneaky manner.
NULL and empty instances are counted at level 0 but will not impact performance. Level 0 will have as many cells as NULL and empty instances at the base table. For geography indexes, level 0 will have as many cells as NULL and empty instances +1 cell, because the query sample is counted as 1
Nowhere in the text is it promised that nulls does not affect performance for geography. Only geometry is supposed to be unaffected.