We're developing a search as a part of a bigger system.
We have Microsoft SQL Server 2014 - 12.0.2000.8 (X64) Standard Edition (64-bit)
with this setup:
CREATE TABLE NewCompanies(
[Id] [uniqueidentifier] NOT NULL,
[Name] [nvarchar](400) NOT NULL,
[Phone] [nvarchar](max) NULL,
[Email] [nvarchar](max) NULL,
[Contacts1] [nvarchar](max) NULL,
[Contacts2] [nvarchar](max) NULL,
[Contacts3] [nvarchar](max) NULL,
[Contacts4] [nvarchar](max) NULL,
[Address] [nvarchar](max) NULL,
CONSTRAINT PK_Id PRIMARY KEY (Id)
);
Phone
is a structured comma separated digits string like"77777777777, 88888888888"
Email
is structured emails string with commas like"[email protected], [email protected]"
(or without commas at all like"[email protected]"
)Contacts1, Contacts2, Contacts3, Contacts4
are text fields where users can specify contact details in free form. Like"John Smith +1 202 555 0156"
or"Bob, +1-999-888-0156, [email protected]"
. These fields can contain emails and phones we want to search further.
Here we create full-text stuff
-- FULL TEXT SEARCH
CREATE FULLTEXT CATALOG NewCompanySearch AS DEFAULT;
CREATE FULLTEXT INDEX ON NewCompanies(Name, Phone, Email, Contacts1, Contacts2, Contacts3, Contacts4, Address)
KEY INDEX PK_Id
Here is a data sample
INSERT INTO NewCompanies(Id, Name, Phone, Email, Contacts1, Contacts2, Contacts3, Contacts4)
VALUES ('7BA05F18-1337-4AFB-80D9-00001A777E4F', 'PJSC Azimuth', '79001002030, 78005005044', '[email protected], [email protected]', 'John Smith', 'Call only at weekends +7-999-666-22-11', NULL, NULL)
Actually we have about 100 thousands of such records.
We expect users can specify a part of email like "@gmail.com" and this should return all the rows with Gmail email addresses in any of Email, Contacts1, Contacts2, Contacts3, Contacts4
fields.
The same for phone numbers. Users can search for a pattern like "70283" and a query should return phones with these digits in them. It's even for free form Contacts1, Contacts2, Contacts3, Contacts4
fields where we probably should remove all but digits and space characters firstly before searching.
We used to use LIKE
for the search when we had about 1500 records and it worked fine but now we have a lot of records and the LIKE
search takes infinite to get results.
This is how we try to get data from there:
SELECT * FROM NewCompanies WHERE CONTAINS((Email, Contacts1, Contacts2, Contacts3, Contacts4), '"[email protected]*"') -- this doesn't get the row
SELECT * FROM NewCompanies WHERE CONTAINS((Phone, Contacts1, Contacts2, Contacts3, Contacts4), '"6662211*"') -- doesn't get anything
SELECT * FROM NewCompanies WHERE CONTAINS(Name, '"zimuth*"') -- doesn't get anything
nvarchar(MAX)
here? I have never heard of, or met anyone who's name is 1 Billion~ characters long. And, according to this answer, an email address can't be longer 254 characters; so you also have 1 Billion~ wasted characters there. – Leal@gmail.com
as a search term because the@
character is a word breaker. In other words, depending the version of SQL Server you have, words in the index for[email protected]
will be either (A)user
,gmail
andcom
or (B)user
,[email protected]
,gmail
andcom
. REF: Behavior Changes to Full-Text Search – StreamlineCONTAINS(Email,'"[email protected]*"')
returns nothing; characters like.
and@
are "stop characters" or "word breakers". At "worst" the above string('[email protected]'
) would be seen as 5 separate words:s
,m
,s
,gmail
, andcom
. – Leal'[email protected]*'
, you'll be able to simply doWHERE Email = '[email protected]'
. Fix the design, fix the problem. – LealPhone
andContacts
fields we can't to search in. – Psidemail
column. – LealContacts
where a user is allowed to type whatever they want and we have to find stuff in this mess. – PsidLIKE
, or ensure that you're searching for words (not email addresses). – LealLIKE
doesn't search appropriate time. It takes minutes for it on 100k records. – Psid.
and an@
are word breakers. – Leal.
as a word breaker would be a really bad idea. View or Change Registered Filters and Word Breakers. This would mean that a search of'America'
would not work in the sentence'Last year, I went to America.'
– LealContacts1, Contacts2, Contacts3, Contacts4
). Do you think this will work out? Can we disable word breakers only when we indexContacts
fields as well asPhone
andEmail
? – Psid.
. – LealLIKE
clause – PsidSELECT * FROM NewCompanies WHERE Id IN (SELECT ID from .... where MyOuterApply.EmailCol1 LIKE '%'+@SearchString+'%') OR Id IN (SELECT ID from .... where MyOuterApply.EmailCol2 LIKE '%'+@SearchString+'%')
. Create about five individual indexes on each of the fields and include the primary key. – Hullo