SQL Server indexing - varchar(100) vs varbinary(100)? [convert data]
Asked Answered
F

2

0

Is it better to set an index (primary or secondary) on a varchar(x) or varbinary(x) column?

I wasn't sure it even mattered. However what I see online (Googled - varchar vs varbinary) is that varchar is almost dead or being pushed to the way side. So is this better to index or something? Could it be the type of index?

Excellent scenario: Indexing email addresses ([edit] encrypted byte array {varbinary} or string equivalent {varchar})

Thanks

Answer?

It seems that indexes on varbinary is the worst thing ever. Am I reading this right?

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/35b61bb0-1fa8-4a2f-a9fb-729a1874dcf8/clustered-index-on-a-varbinary-column

Flophouse answered 29/5, 2015 at 23:58 Comment(6)
Just something to keep in mind if you're deciding between text and binary for your column - binary will result in case-sensitive queries, and text (by default at least) would not. If that's important for your requirements, then it would be more important than deciding based on index performance.Barbarity
And what are you reading that says that varchar is dead? I think you must be misunderstanding whatever it is.Barbarity
Interesting. Case-sensitive, so does that mean it's faster like searching an exact object? Hypothetically would it be better encrypt the email and then put it in? ...excluding the obvious security benefits, only looking at structureFlophouse
Those are TWO TOTALLY different things! No way to compare them directly. varbinary is for binary data - like a PDF file, a video stream, a MP3 file etc. while varchar(x) is for text data - strings, paragraphs, entire book sections. Those have nothing to do with one another..... and varchar(x) is most definitely nowhere near "dead" .....Odeliaodelinda
Understood, but while encrypting it will be a byte array.... so now I have the email (encrypted) as a byte array -- should that be converted to varchar?Flophouse
The original discussion you linked is about avoiding using varchar or varbinary as primary key. You can use an identity primary key and create an unique index on your email (or encrypted email data). Using long varchar or varbinary as primary key has some issues, especially the email value itself provides no clustering benefits.Kippie
H
1

It is better to create an index on varchar than varbinary. Varbinary is suitable for blobs but you can store strings in varbinary also. Such blobs are complementary to your actual data. Your own research lead to that conclusion also.

An email address can be entered by user in variety of formats - [email protected] or [email protected] etc. It is easier to store/extract such information in/from varchar field. Joe Enos is absolutely right that binary comparisons will be case-sensitive (comparing binary info) whereas varchar will be case-insensitive assuming that's how you have set up your DB and column collation. With varbinary, you'll also have to be careful about padding.

Varchar is alive and healthy. When you index varchar(100), try to use a non-clustered index. My general preference is to use a surrogate key in most situations as clustered index.

Hassi answered 30/5, 2015 at 1:38 Comment(4)
Well that explains the case issue, thanks. So expanding on this, if while encrypting an email address I have a byte array. Would it be faster performance if I stored that byte array as varbinary or converted to string and stored it in varchar? ...excluding the conversion time; imagine a db with 300k users and this is a key look up fieldFlophouse
I don't believe there will be any significant difference. However, converting binary data to text format usually leads to bigger data, which makes index a little bit slower.Kippie
It's a good, but not an easy question to answer. I'd salt+sha-256 hash the email address and store the result in a char field. Index that field to do comparisons. Recognize that there are chances of collision, although rare. Encrypt and store email in another field - varbinary is good for that. Review question 70450 alsoHassi
@Kippie Milliseconds matter to me. Tiny things make a big difference (that's what she said). I will take that as my final answer. ...so arguably the varchar would be slower.Flophouse
P
1

The correct datatype to use is dictated by the contents of the column.

while encrypting an email address I have a byte array

This is binary data and should be stored as varbinary (except if the encryption routine outputs a fixed length array then use binary).

casting the varbinary to varchar would be of no benefit at all when it comes to storage (everything is stored as binary anyway) and would be likely to cause incorrect results.

The following all return "yes" in my default collation

SELECT 
       CASE
         WHEN cast(0xE6 AS VARCHAR(20))  = cast(0x6165 AS VARCHAR(20))
           THEN 'yes' else 'no'
       END,
       CASE
         WHEN cast(0xcc0000 AS VARCHAR(20))  = cast(0xcc2020 AS VARCHAR(20))
           THEN 'yes' else 'no'
       END,
       CASE
         WHEN cast(0x202020 AS VARCHAR(20)) = cast(0x AS VARCHAR(20))
           THEN 'yes' else 'no'
       END

Adding a COLLATE LATIN1_GENERAL_BIN alters the result of the first two but not the third.

I would expect looking up a varbinary(n) column to be faster than looking up a varchar(n) column generally as the comparison routines are simpler though perhaps not much in it for a varchar column with a binary collation.

Pneumectomy answered 30/5, 2015 at 17:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.