Why not use varchar(max)?
Asked Answered
V

9

93

I'm a bit old school when it comes to database design, so I'm totally for using the correct data sizes in columns. However, when reviewing a database for a friend, I noticed he used varchar(max) a lot. Now, my immediate thought was to throw it back to him and tell him to change it. But then I thought about it and couldn't come up with a good reason for him not to use it (he'd used a case type tool to generate the db, if you're wondering).

I've been researching the topic of varchar(max) usage and I can't really come up with any good reason for him not to use it.

He doesn't use the columns for indexes, the application that sits on the db has limitations on the input, so it won't allow massive entries in the fields.

Any help would be appreciated to help me make him see the light :).

Valladolid answered 21/8, 2011 at 21:50 Comment(4)
See this answer for another reason #2010194Genevieve
Thats interesting! pity i didn't find this earlier. thanks!Valladolid
For one: you cannot put an index on a VARCHAR(MAX) column... that alone makes me use it only when absolutely needed....Fleurette
I wil point out that expecting the application to always control input is stupid. The data will almost alawys outlast the application and the next version of the application may not have the correct limits.Abran
B
41

My answer to this, isn't about the usage of Max, as much as it is about the reason for VARCHAR(max) vs TEXT.

In my book; first of all, Unless you can be absolutely certain that you'll never encode anything but english text and people won't refer to names of foreign locations, then you should use NVARCHAR or NTEXT.

Secondly, it's what the fields allow you to do.

TEXT is hard to update in comparison to VARCHAR, but you get the advantage of Full Text Indexing and lots of clever things.

On the other hand, VARCHAR(MAX) has some ambiguity, if the size of the cell is < 8000 chars, it will be treated as Row data. If it's greater, it will be treated as a LOB for storage purposes. Because you can't know this without querying RBAR, this may have optimization strategies for places where you need to be sure about your data and how many reads it costs.

Otherwise, if your usage is relatively mundane and you don't expect to have problems with the size of data (IE you're using .Net and therefore don't have to be concerned about the size of your string/char* objects) then using VARCHAR(max) is fine.

Branca answered 21/8, 2011 at 21:58 Comment(8)
Now this is what i expected (well your last paragraph anyway). This was the conclusion i was coming to. I was interested to see if I was reading things wrong or just not understanding things properly, hence why i thought i'd ask.Valladolid
Nope, I think you're right about it - I prefer to constrain my text fields too. I just like to mention the NVARCHAR/NTEXT part where I see people aren't using it, imho its sometimes overlooked.Branca
one thing i noticed in my reading of articles about it, was when you query a table, it has to buffer the maximum size of a record in memory. so that potentially could be a problem, but in todays environments with multiple gigs (if not terabytes) and with ram being cheap and easy to upgrade, it pretty much reduces this problem to nothing.Valladolid
This answer seems to imply that full text indexing can only be used with text datatype. This is not the case. text is a deprecated datatype and has no advantage over varchar(max) AFAIK.Genevieve
that's not quite what I meant, and it (TEXT) does have the specific advantage of affording the optimizer exact knowledge of where the data is stored.Branca
Another reason to not use TEXT/NTEXT is that they're deprecated.Orestes
Can you cite that ? I was only aware that TEXT IN ROW was deprecated.Branca
MSSQL Books Online: ntext, text, and image data types will be removed in a future version of Microsoft SQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.Sordid
T
16

There is a blog post about why not to use varchar max here

Edit

The basic difference is where the data is stored. A SQL Data row has a max size of 8000 bytes (or was it 8K). Then a 2GB varchar(max) cannot be stored in the data row. SQL Server stores it "Out of row".

Therefore you could get a performance hit since the data will not be in the same place on disk, see: http://msdn.microsoft.com/en-us/library/ms189087.aspx

Triadelphous answered 21/8, 2011 at 22:1 Comment(5)
Yeah i read that, but it still left me with questions. If the code is coded correctly, so that the data is limited in the application, then it shouldn't have any performance degradation. Indexing was a non issue, and the last point was about design, which my friend wasn't concerned about.Valladolid
the varchar(max) will not be stored out of row unless the data stored in the row exceeds the rows limitations(which, yeah, is about 8k). Ie if you have the text "hello world" stored in a varchar max in a table with 3 columns, chances are its not going to get stored out of row.Valladolid
The reason not to use them is that they cannot be indexed. It is a poor practice to use nvarchar(max) or varchar(max) unles you expect to have data that needs it.Abran
Its not always a performance hit. Table scans will speed up if the row size decreases. If the varchar(max) in question is rarely used in queries, moving it out of row will be a performance gain.Misbeliever
The performance hit is worth the operational correctness when you cannot anticipate the size of a string you might need to insert into a row.Pratincole
R
3

If you are working in an OLTP environment, you are all about the performance. From overhead and tuning concerns to indexing limitations and query bottlenecks. Using a varcahr(max) or any other LOB type will most likely contravene most design best practices, so unless there is a specific business need that cannot be handled through the use of some other typing mechanism and only a varchar(max) will fit the bill then why subject your system and applications to the kind of overhead and performance issues inherent in one of the LOB datatypes?

If on the other hand you are working in an OLAP environment or in a Star Schema DW environment with Dimension tables with descriptors fields that naturally need to be verbose then a varchar(max), as long as you are not adding that to an index, may be useful. Still I would recommend even then to use a char(x) varchar(x) As it is always a best practice to only use those resources you absolutely must have to get the job done.

Rasure answered 3/7, 2015 at 4:24 Comment(0)
C
2

Redgate wrote a great article about this.
https://www.red-gate.com/simple-talk/sql/database-administration/whats-the-point-of-using-varcharn-anymore/

Conclusions

  • Where appropriate, use VARCHAR(n) over VARCHAR(MAX) for reasons of good design if not performance benefits, and because VARCHAR(MAX) data does not compress
  • Storing large strings takes longer than storing small strings.
  • Updating an in-row VARCHAR(MAX) value from below 8,000 to over 8,000 will be relatively slow, but the difference for a single transaction will likely not be measurable.
  • Updating an in-row VARCHAR(MAX) value from over 8,000 to below 8,000 will be faster than if the table is set to store data out-of-row.
  • Using the out-of-row option for VARCHAR(MAX) will cause slower writes until the strings are very long.
Cavesson answered 30/8, 2018 at 16:7 Comment(0)
A
1

They should NOT be used unless you expect large amounts of data and here is the reason why (directly from Books Online):

Columns that are of the large object (LOB) data types ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified as key columns for an index.

If you want to cripple performance, use nvarchar for everything.

Abran answered 11/4, 2012 at 18:5 Comment(3)
But what if you’re never going to key on that column in the first place? If you’re storing a blob of text, you’re probably looking it up by a different key column. What’s this big concern of needing to put indexes on columns that are likely going to be storing freeform text and never appearing in WHERE except maybe to check IS NULL?Pratincole
@binki, the advice is not to use varchar(max) for everything because there will be fields you need to index. It is only intended for when you need large amounts of data.Abran
Just cause it's from an online book doesn't mean it's right :) Another example of sometimes correct use of (MAX) is when you're storing strings from a source (e.g. blobs, NOSQL fields) that is unsized, so you don't know how big it may be. And I agree with @binki, there are often columns that you know you'll never index.Trinia
M
1

Why not? There are reasons to not use varchar(max):

  1. Just like a good old BLOB, SQL Server cannot index a varchar(max) column.
  2. It's just wasteful and lazy to overprovision, specially varchar(max) since it allocates AT LEAST 8-bytes per row. I've seen developers assigning "max" to single-byte binary (True / False) variables, only to find out later that the system is slow as molasses when discriminating data using those values.
  3. You cannot infer what data type is getting saved there. The obvious use case exception is saving actual large text chunks of up to 8K.
Mariko answered 15/12, 2021 at 18:7 Comment(0)
C
0

I don't know how sql server handles large (declared) varchar fields from a performance, memory and storage perspective.. but assuming it does so as efficiently as smaller declared varchar fields, there's still the benefit of integrity constraints.

The application sitting on the db is supposed to have limits on the input, but the database can properly report an error if the application has a bug in this respect.

Claritaclarity answered 21/8, 2011 at 21:58 Comment(6)
This is a good point, especially if you're not using a managed language to read your strings.Branca
It is a good point. I'll mention this, its being developed in c# 3.5 or 4 (i believe, i should ask him).Valladolid
BTW: Your assumption is not correctGenevieve
@RussClarke Are variable-length strings really a managed-only thing? Sounds like you’ve never encountered heap allocation in unmanaged environments :-p.Pratincole
@MartinSmith Those are very minor performance improvements. Also, the blogger is using VARCHAR(MAX) with comparisons and in the WHERE clause. The use case I’d expect for VARCHAR(MAX) is something that you wouldn’t compare or WHERE in SQL—e.g., a blog post content blob, a freeform text description of something. If you want to compare or WHERE, then of course constraining will help. His example of 'abc' could even be translated into a comparison of two integers. That’s not the sort of data one puts in a VARCHAR(MAX)Pratincole
@Pratincole I get your point and mostly agree, but don't underestimate other's use-cases. I've been working for months now pulling data from cloud blobs and other NOSQL sources that don't size the strings, and have several examples of fields that have values that are useful in where clauses but with some maverick values that are really long. Arguably an application error I guess, but I can't fix the app and I have to store the data.Trinia
I
0

The diff is in next:
VARCHAR(X) can be indexed
VARCHAR(MAX) can't be indexed

Inkberry answered 31/10, 2013 at 15:9 Comment(1)
Where exactly do you think MS SQL is storing LOB data, if not in the .mdf files?Victorious
F
0

   It is somewhat old-fashioned to believe that the application will only pass short strings to the database, and that will make it okay.

   In modern times, you HAVE to anticipate that the database will be accessed primarily by the current application, but there may be a future version of the application, (will the developer of that version know to keep strings below a certain length?)

   You MUST anticipate that web services, ETL processes, LYNC to SQL, and any other number of already existing, and/or not-yet-existing technologies will be used to access your database.

   Generally speaking I try not to go over varchar(4000), because it's four-thousand characters, after all. If I exceed that, then I look to other datatypes to store whatever it is I am trying to store. Brent Ozar has written some pretty great stuff on this.

   All that said, it is important to evaluate the current design's approach to your current requirements when you are working on a project. Have an idea of how the various parts work, understand the trade-offs of various approaches and solve the problem at hand. Exercising some great axiom can lead to blind adherence which might turn you into a lemming.

Foreordination answered 2/5, 2018 at 15:39 Comment(1)
For each 4000 varchar, you're forcing the SQL optimizer to pre-allocate a 2000 chunk of memory for indexing purposes. There are no free lunches outside of varchar(max) either. 2000 length sounds like it is an unindexable free-style data column. I'm 99% sure that is not the case, overprovisioning for the future or not.Mariko

© 2022 - 2024 — McMap. All rights reserved.