Are there any Sql Server Full-Text Search (FTS) performance improvements since version 2008 R2?
Asked Answered
S

2

6

we're using SQL Server 2008 R2 Full-Text Search over a table with 2.6 million records. The search performance often is poor, it follows the commonly reported pattern: cold system/first run ~10+ sec, subsequent runs ~1-2 sec. This is inline with results reported in the following article dated of Feb, 2013:

So You Think You Can Search – Comparing Microsoft SQL Server FTS and Apache Lucene

The article shows the following speed comparison results using Wikipedia dump data:

Indexing speed, size and single query execution time using:

                        Lucene      MS SQL FTS
Indexing Speed          3 MB/sec    1 MB/sec
Index Size              10-25%      25-30%
Simple query            < 20 ms     < 20 ms
Query With Custom Score < 4 sec     > 20 sec
Parallel Query Executions (10 threads, average execution time per query in ms):

                                     MS SQL FTS  Lucene (File System)   Lucene (RAM)
Cold System:         Simple Query    56          643                    21
                     Boost Query     19669*      859                    27
Second executions:   Simple Query    14          8                      < 5
                     Boost Query     465         17                     9

*average time, the very first query could be executed up to 2 min(!)

My questions are:

  1. Since there were several major SQL Server releases since the article was published on Feb 8, 2013, can someone report any FTS performance improvements over same data (preferably of 1+ million records) when they migrated to more recent SQL Server versions (2012, 2014 and 2016)?

  2. Do more recent SQL Server versions support FTS catalogs/indexes placed in RAM just as solr/lucene do?

UPDATE: in our scenario we seldom insert new data into FT catalog linked table, but run read only searches very often. So, I don't think SQL constantly rebuilding FTS index is the issue.

Stranger answered 23/9, 2016 at 13:17 Comment(0)
O
2

Fulltext Search Improvements in SQL Server 2012:

We looked at the entire code base from how queries block while waiting an ongoing index update to release a shared schema lock, from how much memory is allocated during index fragment population, to how we could reorganize the query code base as a streaming Table Value Function to optimize for TOP N search queries, how we could maintain key distribution histograms to execute search on parallel threads, all the way to how we could take better advantage of the processor compute instructions (scoring ranks for example)… End result is that we are able to significantly boost performance (10X in many cases when it comes to concurrent index updates with large query workloads) and scale without having to change any storage structures or existing API surface. All our customers going from SQL 2008 / R2 to Denali will benefit with this improvement.

Oculist answered 23/9, 2016 at 13:37 Comment(3)
Thanks for the comment, very valuable info. However, I was looking for a real-world experiences. In addition to MSFT claims, can someone report on the real FTS performance increase when they have migrated off SQL Server 2008 R2 to more recent versions? So far I've found many people complain on FTS slowness even in more recent sql server versions (2014 for example). It seems SQL Server 2005 was the fastest release in terms of FTS.Stranger
developer editions are free and have same features as enterprise.You can use them as testing groundsOculist
we have ms subscription. It's not a problem to get a new sql instance. Just collecting data on what version to upgrade to. If the search time stays like it is now, we'll move away from FTS to solr.Stranger
K
0

I'd recommend you to dig a bit into SQL Server FTS internals. This will give you an idea how your query is executed and if this works for you, or not. I suggest to start from here: https://technet.microsoft.com/en-us/library/ms142505(v=sql.105).aspx and here: https://msdn.microsoft.com/ru-ru/library/cc721269.aspx. Internally FTS uses tables and indexes. With all their benefits and drawbacks. So, like any other table, if data of that internal table is not in Buffer Pool, SQL Server will read from disk to RAM. Once the data in the RAM, it will be read from the RAM.

Karlee answered 23/9, 2016 at 13:46 Comment(4)
Denis, thanks for the links. But see the article I've linked in my question. The article states that Solr/Lucene specifically support Index location in RAM and notice the performance increase even with cold queries when Solr index is in RAM, while SQL Server is said to not support this. I wanted to know if SQL Server FTS has this specific feature in recent releases, not counting the normal index caching which it does by default.Stranger
@andrews, yes, that is what I wanted to stress, this statement: "SQL Server use disk, Lucene use RAM", is incorrect. If you have 32 GB of ram, but your index is 64 GB, anyway, you can't hold it in RAM completely, neither using SQL Server, nor Lucene.Karlee
@Stranger SQL Server FTS is just a set of tables which are joined with user tables on FTS query. Like any other table FTS data can be read only from the RAM, so SQL Server should take the needed data to the RAM and send it to the client. If it's enough RAM, all that data will stay in RAM. This is some kind of reuse of existing relational mechanisms which are quite efficient.Karlee
@Stranger The question here if this mechanism is efficient for your case. In general moving to external FTS service is not a bad idea. Stack Overflow also have moved from SQL Server FTS to the Elastic Search when they found it is more efficient for them.Karlee

© 2022 - 2024 — McMap. All rights reserved.