What size to pick for a (n)varchar column? [closed]
Asked Answered
R

10

20

In a slightly heated discussion on TDWTF a question arose about the size of varchar columns in a DB.

For example, take a field that contains the name of a person (just name, no surname). It's quite easy to see that it will not be very long. Most people have names with less than 10 characters, and few are those above 20. If you would make your column, say, varchar(50), it would definately hold all the names you would ever encounter.

However for most DBMS it makes no difference in size or speed whether you make a varchar(50) or a varchar(255).

So why do people try to make their columns as small as possible? I understand that in some case you might indeed want to place a limit on the length of the string, but mostly that's not so. And a wider margin will only be beneficial if there is a rare case of a person with an extremely long name.


Added: People want references to the statement about "no difference in size or speed". OK. Here they are:

For MSSQL: http://msdn.microsoft.com/en-us/library/ms176089.aspx

The storage size is the actual length of data entered + 2 bytes.

For MySQL: http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html

L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes

I cannot find documentation for Oracle and I have not worked with other DBMS. But I have no reason to believe it is any different there.

Ricciardi answered 11/8, 2009 at 16:18 Comment(5)
I will love a reference to "However for most DBMS it makes no difference in size or speed whether you make a varchar(50) or a varchar(255)"Boffin
See https://mcmap.net/q/88795/-size-of-varchar-columnsSlr
It's not duplicate - this is a more general question, not only about MSSQL.Ricciardi
Falsehoods about names An acquaintance of mine can't book flights online because his name is too long for the system. He's from Thailand, where names are frequently more than 10 or even 20 chars.Deice
In case if someone wants a reference for Oracle, here is a quote from Database Concepts: For each row, Oracle Database stores each value in the column as a variable-length field... For example, assume you declare a column VARCHAR2 with a maximum size of 50 characters. In a single-byte character set, if only 10 characters are given for the VARCHAR2 column value in a particular row, the column in the row's row piece stores only the 10 characters (10 bytes), not 50. Dingy
H
24

I can only speak for Oracle. A VARCHAR2(50) and a VARCHAR2(255) take up exactly the same amount of space and perform identically, if you enter the value 'SMITH'.

However, the reason why it is generally not a good idea to go around declaring all your textual columns as VARCHAR2(4000) is that column length is, effectively, another constraint. And constraints are database implementation of business rules, so they are definitely something that should be defined on the database side of things.

As a for-example. You define a CHECK constraint on a column so that the values it can accept are only 'Y' and 'N'. That saves your application from having to deal with 'y' and 'n' or even '1' and '0'. The check constraint ensures your data conforms to expected standards. Your application code can then make valid assumptions about the nature of the data it has to deal with.

Column length definition is in the same boat. You declare something to be a VARCHAR2(10) because you don't want it accepting an entry of 'ABC123ZYX456' (for whatever reason!)

In Australia, I define STATE columns to be a varchar2(3) because I don't want people typing in 'New South Wales' or 'South Australia'. The column definition pretty much forces them to be entered as 'NSW' and 'SA'. In that sense, a VARCHAR2(3) is almost as much a check constraint as actually specifying a CHECK IN ('NSW','SA','VIC' etc) constraint.

In short, proper column lengths are a way of encoding business rules. They're another form of constraint. They bring all the advantages of constraints (and suffer from many of the same drawbacks). And they ensure, to a small extent, a degree of 'data cleanliness' that "proper" constraints help with, too.

I don't buy the argument, either, that it's best to stick these sorts of things in the client app because it's easier to change there. You have 20,000 people using an app, that's 20,000 updates. You have one database, that's one update. The 'easier to change the client app' argument, if true, would potentially mean the database just gets treated as a giant bit bucket with all the clever logic being handled in client code. It's a big discussion to have, but since all RDBMSes let you define constraints and so on in the database itself, it's pretty clear that there's at least a worthwhile case to be made that such fundamental logic belongs in the backend.

Heptode answered 14/8, 2009 at 1:5 Comment(5)
It's a good answer, but it just leads to another question: so, then, what is a reasonable length constraint for a name? and is there even one?Dunkle
Upvoted for a good, detailed discussion! :) Still, there is nothing that I didn't say in my OP. Yes, there are cases, when it makes sense to put a limit on a string. Like the hash of a password or the "three letter code of a state". But in vast majority of cases varchar fields are for names and descriptions, where there is no obvious limit.Ricciardi
While I agree that it's another constraint, it sure doesn't replace other constraints. E.g. accepting 42! for a STATE column is [likely] invalid, even if it's only 3 characters long.Betide
@PavelMinaev See Falsehoods about namesDeice
In era of application servers, all the validation etc is done out there. Thus constraint check on database turns out to be a CPU waste.Villon
L
6

I have heard the query optimizer does take varchar length into consideration, though I can't find a reference.

Defining a varchar length helps communicate intent. The more contraints defined, the more reliable the data.

Lupien answered 11/8, 2009 at 16:58 Comment(3)
I agree about the intent, but I do not see how it would make the data more reliable.Ricciardi
I should have said 'predictable' rather than reliable. The fewer values a variable can have, the fewer chances it has to do something I don't expect.Lupien
Unless you are programming in C/C++ or another language where you have to worry about buffer overflows, one string will be just the same as another in your code.Ricciardi
M
3

So why do people try to make their columns as small as possible? I don't believe in making them as small as possible, but sizing them appropriately. Some reasons for making (n)varchars smaller rather than larger:

1) With a larger field, all clients that use the database must be able to handle the full size. For example, take a system that holds a United States address with 255 characters per each field: (Similar to TDWTF that you reference, I believe.)

  • First Name
  • Last Name
  • Address Line 1
  • Address Line 2
  • City
  • State
  • ZIP Code

Now your data entry screens will need to allow and show 255 characters per field. Not hard, but unlikely to look nice with larger fields Printing invoices, you will need line breaking logic to handle the large fields. Depending on tool, not that hard.

But I would not want the problem of formatting the address for an envelope that could have 255 characters for each of those fields or just any one of those fields. Are you going to truncate if the field is too long to fit? Great someone has Address Line 1 of "House Number Streat Number ... blah blah blah ... Appartment number 111." And you'll lop off the important apartment number. Are you going to wrap? How much? What if you just can't fit it in the little box of space on the envelop? Raise an exception and have someone hand letter it?

2) While 10 characters of data held in a varchar(50) versus varchar(255) does not impact size or speed, allowing 255 characters allows for more space to be taken. And if all fields are that large you could hit size limits in SQL Server 2000. (I haven't read up on 2005 & 2008 to see if they can handle rows greater than one page.) And with Oracle you the larger sizes allows row chaining to happen if someone actually uses all the available characters.

3) Indexes have stricter size limits then leaf pages. You may preclude indexes, especially composite indexes, if you create your varchars too big.


On the other hand, I have a long line 1 for my address, and have been frustrated by web sites that don't allow the full thing to be typed.

Merilee answered 11/8, 2009 at 16:45 Comment(4)
1) If there really is a size limit (like the available space on an envelope), then it does make sense to put a limit there, that I already noted in my question. But I would rather prefer to make the limit in my client app than in the DB. Because if the limit ever changed, I would then only have to change the client app - in most cases a task far easier than doing a DB schema update.Ricciardi
2) Row size is a point, however since most real data will not reach these limits, it is quite unlikely that a row so big will be met. The large sizes are meant more for fail-safe than actual use.Ricciardi
I fail to see how the excessive size on VARCHARs is a failsafe, since the open the possibility that some one will try stuffing in a record that meets all the data type constraints, but FAILS because the overall size is too large for the leaf node or index.Merilee
It is a failsafe for the poor person who indeed has a long name or address. And I fail to grasp why someone would try to fill up all the fields to their limit. A hacker maybe? Well then what do you care if the app throws an exception at him for it? Better for you, I'd say.Ricciardi
S
3

One important distinction is between specifying an arbitrarily large limit [e.g. VARCHAR(2000)], and using a datatype that does not require a limit [e.g. VARCHAR(MAX) or TEXT].

PostgreSQL bases all its fixed-length VARCHARs on its unlimitted TEXT type, and dynamically decides per value how to store the value, including storing it out-of-page. The length specifier in this case really is just a constraint, and its use is actually discouraged. (ref)

Other DBMSs require the user to select if they require "unlimitted", out-of-page, storage, usually with an associated cost in convenience and/or performance.

If there is an advantage in using VARCHAR(<n>) over VARCHAR(MAX) or TEXT, it follows that you must select a value for <n> when designing your tables. Assuming there is some maximum width of a table row, or index entry, the following constraints must apply:

  1. <n> must be less than or equal to <max width>
  2. if <n> = <max width>, the table/index can have only 1 column
  3. in general, the table/index can only have <x> columns where (on average) <n> = <max width> / <x>

It is therefore not the case that the value of <n> acts only as a constraint, and the choice of <n> must be part of the design. (Even if there is no hard limit in your DBMS, there may well be performance reasons to keep the width within a certain limit.)

You could use the above rules to assign a maximum value of <n>, based on the expected architecture of your table (taking into account the impact of future changes). However, it makes more sense to define the minimum value of <n>, based on the expected data in each column. Most likely, you will expand to the nearest "round number" - e.g. you will always use either VARCHAR(10), VARCHAR(50), VARCHAR(200), or VARCHAR(1000), whichever is the best fit.

Starks answered 17/8, 2009 at 19:12 Comment(1)
+1 This is true of SQL Server (at least 2008). There is only about ~8k of "reserved data" allowed per record.Betide
P
2

Simple answer to this in my opinion is the fact that you cannot use that column as an index key, if you require any indexing you are basically forced to use fulltext... this is with regards to using a varchar(max) column. In any case 'right-sizing' columns makes a lot of sense whenever you [may] want to apply any indexing; updating variable length columns may be a costly maneuver as these are not done in place and can/will cause some amount of fragmentation.

All with regard to MS SQ-Server.

Pollitt answered 18/8, 2009 at 19:17 Comment(1)
Hmm... looks like you're right. I had missed the part about indexes on varchar(max).Ricciardi
L
1

I'll answer your question with a question: If there is no difference to the DBMS between a varchar(50) and a varchar(255), why would the DBMS let you make a distinction? Why wouldn't a DBMS simply say "use varchar for up to xxx characters, and text/clob/etc. for anything over that." Sure, perhaps Microsoft/Oracle/IBM might keep the length definition for historical reasons, but what about DBMS' like MySQL which has multiple storage backends- why does every one implement definable character column lengths?

Lure answered 12/8, 2009 at 17:35 Comment(4)
Because sometimes you do need that length constraint? Although it would indeed then be better as a /constraint/ not a column datatype parameter. OK, I don't know. :)Ricciardi
Right. So aside from backwards compatibility there is no "on the surface" reason for the existance of such a feature. The suggests there being an deeply buried reason for it. I suppose it's possible that other DBMS' would do it for a "me too" reason, but I am of the belief that developers arne't adding features for the sake of adding features. At least, I don't :)Lure
Or maybe it is a historical feature that comes from 80s, when databases had more primitive storage optimizers, and every little bit helped.Dunkle
This is certainly true of PostgreSQL, which implements VarChar(length) as a constrained version of its any-length text type largely for compatibility reasons. However, even if it was syntactically a column constraint, the question would still be valid as to how best to use it.Starks
C
1

If you are going to print labels you usually want the string to be no longer than 35 characters. This is why you want some control on the size of the Varchar that you are going to use to accept the lines that are going to be used to print labels.

Cystoscope answered 14/2, 2013 at 20:9 Comment(0)
H
0

If you allow the data length to be over 255 and someone links to the data through MS Access the data is not able to be used to join tables (comes in as a memo field). If the data is exported to excel it will be limited to 255 characters per field. Compatibility with other programs should be considered when creating data sets.
Data quality control is all about controlling the data entering your environment. What do you need to store that is over 255 characters? There are times that data needs to be over 255 characters, but they should be far and few between and should be used as supportive supplemental information for a field that can be used for analysis

Hemitrope answered 21/5, 2018 at 8:47 Comment(0)
S
0

Size do matter, and it CAN affect performance! In mssql the execution planner makes assumptions and this assumption can hurt, when planning a sort of a varchar column, the planner predict average row size to 50% of declared length, + a little for overhead, so if you declare varchar(200) the execution planner estimate row size to around 110 bytes, and if the rows is filled to 80% the execution plan will be forced to use tempdb to sort the rows instead of in memory sort. I have made proof and full example available here: https://kisunu.no/index.php?pid=20130

Serrano answered 12/3, 2021 at 18:25 Comment(0)
M
0

I didn't see this mentioned elsewhere, so I will add: there can be a performance impact when overallocating VARCHAR size.

Remember that the "variable" aspect of VARCHAR is related to storage on disk. But when the DB needs to read the data into memory it has to allocate memory based on the max possible size of a value in the column. This is because the DB doesn't know how big the data is until it is read off disk.

This effect gets amplified by the volume of data. If you have a small database/workload, you probably won't even notice. However, if you have billions of rows, across hundreds of tables with dozens of max-width VARCHAR columns and a lot of queries, you'll probably notice a strain on the DB's memory resources.

Meggie answered 22/2, 2024 at 14:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.