MySQL: Large VARCHAR vs. TEXT?
Asked Answered
P

9

910

I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.

Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.

Proa answered 7/1, 2010 at 20:40 Comment(8)
A bit old, but I came here because I ran into a problem that made me think about this. In my case my front-end form was limited to 2,000 characters but the encoding implicit in my storage method encoded international characters as multiple characters (which can apparently anywhere from 3 - 12 per character). So my 2,000 suddenly becomes up to 24,000. Something to think about...Laid
I have found text to be significantly faster for many concurrent inserts.Fruiterer
@JamesS: utf8mb4... >.<Laser
Here is a new thread: dba.stackexchange.com/questions/210408/…Cleaver
@Rick James - The question you have linked to is not the same question at all. Please note that this thread has been viewed 400k times and you propose to replace it with something with 35 views and your own answer as the top answer? This question is still perfectly valid and the answers here a useful record.Proa
@RickJames consider posting an updated answer, rather than close the questionNumbersnumbfish
@YvetteColomb - I added an Answer. I would mainly like to get rid of the Accepted Answer because it is out of date. I came to the Q&A because someone was quoting incorrect info, saying "754 upvotes, so it must be right". OK, I edited the Approved answer, too. (Though that feels improper.)Cleaver
MySQL :: MySQL 8.0 Reference Manual :: 15.10 InnoDB Row FormatsMcclenaghan
B
851
  • TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.

  • VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.

Brassica answered 7/1, 2010 at 20:45 Comment(4)
+1: VARCHAR (stored inline) is usually faster IF the data is frequently retrieved (included by most queries). However, for a large volume of data that is not normally retrieved (that is, not referenced by any query), then it may be better to not have the data stored inline. There is an upper limit on the row size, for data stored inline.Nudibranch
@Pacerier: the exact benefit of avoiding "inline" storage is an increase in the number of rows that can be stored in a block, which means the table rows occupy fewer blocks in the InnoDB buffer cache (smaller memory footprint), and means fewer blocks to be transferred to and from disk (reduced I/O). But, this is only a performance benefit if the columns stored "off row" are largely unreferenced by queries. If those "off row" columns are referenced by most queries, that benefit largely evaporates. Inline is preferred if the columns fit in the max rowsize and are frequently referenced.Nudibranch
"VARCHAR is faster when the size is reasonable". What is a "reasonable" number of characters, 100? 1000? 100,000?Blok
This answer is not correct for InnoDB. Both VARCHAR and BLOB/TEXT are stored inline with other columns if the value on a given row fits in the page size (16KB and each page must hold at least two rows). If the string is too large for that, it overflows to additional pages. See mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb for a detailed explanation.Weidar
B
535

Can you predict how long the user input would be?

VARCHAR(X)

Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password


TEXT

Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links


MEDIUMTEXT

Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings


LONGTEXT

Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging

There's more information on this question.

Broyles answered 1/11, 2012 at 17:56 Comment(2)
Predictability is really a side item here. It's actually maximum expected length that should be the deciding factor. The items you mention as more predictable are only that way because they are shorter than the others.Jezebel
@andrew-barber That's my point though. All the other posts explain well about the differences but not about the situations when you actually have to make a choice between the two. I was trying to point out using varchar for predictably short is a good choice and using text for arbitrarily long is a good choice.Broyles
G
227

Just to clarify the best practice:

  1. Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)

  2. String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).

I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.

What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.

Genetics answered 7/1, 2010 at 21:53 Comment(1)
So what would you recommend dealing with over-sized URLs like you have frommagnet: URLs? They can be VERY much longer than just 255 characters.Wrongdoing
C
43

Short answer: No practical, performance, or storage, difference.

Long answer:

There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)

For smaller limits in VARCHAR, there are some advantages over TEXT.

  • "smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
  • INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
  • Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
  • Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
  • VARBINARY is like VARCHAR; BLOB is like TEXT.
  • A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )

Rebuttal to other answers

The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.

When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.

The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .

Cleaver answered 25/6, 2018 at 16:5 Comment(1)
@KostaKontos - Thanks for the praise and the typo fix. When I see a need for a better answer, I will add an answer, even if 8 years and 800 upvotes too late.Cleaver
I
34

Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.

I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.

Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.

Islamite answered 7/1, 2010 at 20:47 Comment(10)
The row length limit is 65,535 bytes [ dev.mysql.com/doc/refman/5.0/en/column-count-limit.html ]. If your column is utf8-encoded, that means a 3000-character varchar column can take up to 9000 bytes.Telegu
UTF-8 characters can be up to 4 bytes, so I think you meant 12,000 bytes (unless there is some MySQL thing I'm not understanding here).Torino
@Torino MySQL's UTF-8 is "fake UTF-8" in that it only supports 3 bytes per character max, so there is no way to directly store unicode characters beyond BMP plane in MySQL's UTF-8. This is fixed in MySQL 5.5.Ammoniacal
I believe that this assertion is valid for MyISAM only. I can't find a definitive source but I believe that InnoDB stores TEXT inline in the table as well.Schizont
@Schizont I found a source here explaining that storing of variable length data using InnoDB may vary (can be stored externally or inline within the row) mysqlserverteam.com/externally-stored-fields-in-innodbRetreat
Depending or row_format, row size, and other things, none, some, or all of a TEXT or VARCHAR is stored off-record. You can't make a simple statement about what is done in InnoDB.Cleaver
@AnthonyRutledge - 5.1 does not have utf8mb4; 5.5-5.7 have utf8mb4 as an option; 8.0 defaults to the full 4-byte UTF-8.Cleaver
@RickJames Yes, but it does have a three byte UTF-8.Entebbe
@AnthonyRutledge - Yes, utf8 (3-byte max) has been around since 4.1.Cleaver
@RickJames Shame about the cursors, views, triggers, and stored procedures, though.Entebbe
F
7

The preceding answers don't insist enough on the main problem: even in very simple queries like

(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id) 

a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.

Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/ (This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)

Feingold answered 30/6, 2018 at 21:43 Comment(1)
InnoDB: The same applies through version 5.7. With 8.0, varchar temps are variable length.Cleaver
V
4

Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.

The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.

Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.

Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.

Follow these tips for maximum speed and performance:

  1. Use varchar for name, titles, emails

  2. Use Text for large data

  3. Separate text in different tables

  4. Use Left Join queries on an ID such as a phone number

  5. If you are going to use Blob apply the same tips as in Text

This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.

Vulgarity answered 16/4, 2019 at 23:17 Comment(0)
R
4

There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.

If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.

  • If you use you field only for archiving
  • you don't care about data speed retrival
  • you care about speed but you will use the operator '%LIKE%' in your search query so indexing will not help much
  • you can't predict a limit of the data length

than go for TEXT.

Relevant answered 23/6, 2019 at 7:43 Comment(1)
Partially misleading info: TEXT columns cannot be index in their entirety. When you include a TEXT column in the index you must specify the length. Also VARCHARs cannot be indexed in their in their entirety in the case of VARCHARs > 255 as there is a max length on the index size.Triangulate
W
1

Just a correction to so many answers here, even if it is a bit late to the party.

Text fields can be fully indexed by MySQL as per their documentation.

Link provided https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html

Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests

Windblown answered 12/6, 2022 at 13:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.