Difference between text and varchar (character varying)
Asked Answered
D

13

1013

What's the difference between the text data type and the character varying (varchar) data types?

According to the documentation

If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.

and

In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.

So what's the difference?

Dude answered 31/1, 2011 at 8:44 Comment(0)
K
1215

There is no difference, under the hood it's all varlena (variable length array).

Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

A couple of highlights:

To sum it all up:

  • char(n) – takes too much space when dealing with values shorter than n (pads them to n), and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit
  • varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
  • varchar – just like text
  • text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name

The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.

Kotz answered 31/1, 2011 at 8:55 Comment(21)
@axiopisty It's a great article. You could just say, "Could you pull in some excerpts in case the article ever goes down?" I've tried to briefly summarize the article's content/conclusions. I hope this is enough to ease your concerns.Gravity
@axiopisty, strictly speaking, the initial answer was saying "under the hood it's all varlena", which is certainly useful information that distinguishes this answer from a link-only answer.Sextant
One thing to keep in mind with a limitless string is that they open the potential for abuse. If you allow a user to have a last name of any size, you may have someone storing LARGE amounts of info in your last name field. In an article about the development of reddit, they give the advise to "Put a limit on everything".Meaganmeager
@MarkHildreth Good point, though generally constraints like that are enforced further out in an application these days—so that the rules (and attempted violations/retries) can be handled smoothly by the UI. If someone does still want to do this sort of thing in the database they could use constraints. See blog.jonanin.com/2013/11/20/postgresql-char-varchar which includes "an example of using TEXT and constraints to create fields with more flexibility than VARCHAR".Incorporated
@NickBarnes increasing the size doesn't, still interesting point.Kilter
@Incorporated blog.jonanin.com/2013/11/20/postgresql-char-varchar -> This is down, but found here archive.is/6xhA5 .Benedictine
@NickBarnes you should amend / delete the comment if possible as it is doesn't apply to reductions - increasing the size doesn't trigger a table rewrite.Benedictine
Since they "text" and "varchar" are very similar performance wise and implementation wise in postgres - and since text is not ansi sql and "varchar" is ansi sql - then "varchar" should be a strong option if not the winner. Comes from simple check on both Postgress Embedded and H2 memory dbs where only the former allowed adding index on "text". I do see that @Incorporated mentioned "text" has more constraint flexibility than "varchar". Or did I miss anything else significant? So it sounds like a tradeoff each one has to make.Volar
It is really alarming that this comment has so much votes. text should never, ever be considered "a winner over varchar" out of the box just because it allows me to input strings of any length, but exactly the opposite, you should really think about what kind of data you want to store before allowing your users to input strings of any lenght. And NO, "let the Frontend handle it" is definitely not acceptable and a very bad development practice. Really surprising to see a lot of devs doing this nowadays.Fernandez
Good point @JoséL.Patiño, however he does state the following: "Function based constraints or domains provide the advantage of instant increase of the length constraint". It sounds to me like he's saying: "Use text, and if you need a length limit, use a check constraint or domain."Wideopen
@JoséL.Patiño I recently learned that as of v9.2 increasing the limit of a varchar column no longer requires a table rewrite, making it virtually instantaneous. I now prefer varchar(n) over text because it allows me to increase the length without the forced rechecking or hackish solutions (e.g. function constraints) provided in OP's article.Wideopen
@JoséL.Patiño I think it really depends on the use case. I don’t know what’s your definition of “frontend”, but in my opinion, the business layer, which includes the backend server, can be considered as the frontend of the database. I think it's more flexible and easier to manage if we keep size constraints in the backend server while storing the data as text type in the database. In my experience, I find it’s more challenging to rollout database migrations than shipping application code, and in many cases, the text size could be changed very often based on the business needs.Adversaria
The answer is so wrong. Why would the VARCHAR(n) exist in the first place if TEXT can solve everything? Using unlimited char type for a column that you already know the length seems so wrong. Take my downvote.Amata
@Amata You can disagree, but then come up with a complete answer or article about why some data type is better then the other. Just like Depesz did. That some data type has been invented in the past, doesn't mean that it was a good idea. Human kind has made so many stupid mistakes in the past...Kotz
@JoséL.Patiño, It sounds reasonable, but to my experience, it is of very little practical use. The values stored (or even handled) by an application need to be verified to determine if they represent a desired domain: think of handling url, sha1 hash, etc. That poses requirements for lot more that just the maximal length: minimal length, character class, more complex grammar, etc. Enforcing one of these on DB level and the rest elsewhere is messy, and enforcing length on 2 places is cumbersome to change and -- in case of postgres text and varchar -- provides no perf benefit either.Jacinthe
In my 20 years of working with computers I have experienced several bugs caused by valid data that failed to be stored in a fixed size text column (especially in MySQL which just truncates the data 😱). But I have never come across a bug caused by the lack of a fixed size.Tericaterina
@JoséL.Patiño "let the frontend handle it" absolutely no one made that claim, you just made that up. These sorts of things are/ought to be handled by the API layer (so, the backend, not the frontend, which could easily be bypassed by the end-user). Your comment makes no sense and it's really alarming that it has so many upvotes.Horotelic
@AradAlvand I find way more alarming that you still saying things like "these sort of things are to be handled by the API". The API (which BTW I can take as "the front-end" for an RDBMS) needs to care about user input validation (which also BTW is the duty of the UI if exists), period. The data integrity and typing is the duty of the DB schema and DB should refuse to accept anything that does not match it. The DB schema is your last defense barrier against bad data and you should NEVER trust that any "API" or "frontend" or you-name-it is going to do that job downstream for ya.Fernandez
@aderchox I did provide an explanation. If what you're building is a web app backend, then all outside requests to your database go through your backend application, without exception, so your backend is effectively a fully-covering proxy to your database; it would be no less secure to implement the validation there — and it would in fact be easier and more flexible, as opposed to doing so on the database side of things. What's not clear about that?Horotelic
@JoséL.Patiño You can absolutely trust your backend (which, yes, is technically a "front-end" to your database, but that's a completely misleading thing to say here, what is meant by "the front-end" in this context is your client-side app), if you know that all requests to your database always go through your backend — which is true in the case of your typical website backend, for example. The "last defense barrier" argument makes no sense either, if someone somehow finds a way to bypass your backend and obtain direct access to your database, data validation is the least of your concerns.Horotelic
@AradAlvand you are mixing completely different concepts like data integrity and systems security and trying so bad to diminish my arguments with a display of what has historically been considered bad practices. This conversation is over, do whatever you think is best for you. Good luck, and have a good day.Fernandez
P
160

As "Character Types" in the documentation points out, varchar(n), char(n), and text are all stored the same way. The only difference is extra cycles are needed to check the length, if one is given, and the extra space and time required if padding is needed for char(n).

However, when you only need to store a single character, there is a slight performance advantage to using the special type "char" (keep the double-quotes — they're part of the type name). You get faster access to the field, and there is no overhead to store the length.

I just made a table of 1,000,000 random "char" chosen from the lower-case alphabet. A query to get a frequency distribution (select count(*), field ... group by field) takes about 650 milliseconds, vs about 760 on the same data using a text field.

Pyramidon answered 1/2, 2011 at 19:53 Comment(3)
technically the quotes aren't part of the type name. they are needed to differentiate it from the char keyword.Maori
Technically you are correct @Jasen... Which, of course, is the best kind of correctJezabel
datatype "char" is not char?? It is valid in nowadays of PostgreSQL 11+? ... Yes: "The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.", guide/datatype-character.Tletski
T
121

(this answer is a Wiki, you can edit - please correct and improve!)

UPDATING BENCHMARKS FOR 2016 (pg9.5+)

And using "pure SQL" benchmarks (without any external script)

  1. use any string_generator with UTF8

  2. main benchmarks:

2.1. INSERT

2.2. SELECT comparing and counting


CREATE FUNCTION string_generator(int DEFAULT 20,int DEFAULT 10) RETURNS text AS $f$
  SELECT array_to_string( array_agg(
    substring(md5(random()::text),1,$1)||chr( 9824 + (random()*10)::int )
  ), ' ' ) as s
  FROM generate_series(1, $2) i(x);
$f$ LANGUAGE SQL IMMUTABLE;

Prepare specific test (examples)

DROP TABLE IF EXISTS test;
-- CREATE TABLE test ( f varchar(500));
-- CREATE TABLE test ( f text); 
CREATE TABLE test ( f text  CHECK(char_length(f)<=500) );

Perform a basic test:

INSERT INTO test  
   SELECT string_generator(20+(random()*(i%11))::int)
   FROM generate_series(1, 99000) t(i);

And other tests,

CREATE INDEX q on test (f);

SELECT count(*) FROM (
  SELECT substring(f,1,1) || f FROM test WHERE f<'a0' ORDER BY 1 LIMIT 80000
) t;

... And use EXPLAIN ANALYZE.

UPDATED AGAIN 2018 (pg10)

little edit to add 2018's results and reinforce recommendations.


Results in 2016 and 2018

My results, after average, in many machines and many tests: all the same
(statistically less than standard deviation).

Recommendation

  • Use text datatype,
    avoid old varchar(x) because sometimes it is not a standard, e.g. in CREATE FUNCTION clauses varchar(x)varchar(y).

  • express limits (with same varchar performance!) by with CHECK clause in the CREATE TABLE
    e.g. CHECK(char_length(x)<=10).
    With a negligible loss of performance in INSERT/UPDATE you can also to control ranges and string structure
    e.g. CHECK(char_length(x)>5 AND char_length(x)<=20 AND x LIKE 'Hello%')

Tletski answered 31/1, 2011 at 8:44 Comment(5)
So it does not matter than I made all of my columns varchar instead of text? I did not specify the length even though some are only 4 - 5 characters and certainly not 255.Epiphysis
@Epiphysis yes, it does not matterLifeblood
cool, I redid it to be safe and I made everything text anyway. It worked well and it was super easy to add millions of historical records quickly anyways.Epiphysis
@Epiphysis and reader: the only exception is the faster datatype "char", that is not char, even in nowadays of PostgreSQL 11+. As the guide/datatype-character says "The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.".Tletski
still valid with pg11 in 2019: text>varchar(n)>text_check>char(n)Carrick
F
57

On PostgreSQL manual

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

I usually use text

References: http://www.postgresql.org/docs/current/static/datatype-character.html

Florin answered 5/11, 2015 at 14:44 Comment(0)
T
41

In my opinion, varchar(n) has it's own advantages. Yes, they all use the same underlying type and all that. But, it should be pointed out that indexes in PostgreSQL has its size limit of 2712 bytes per row.

TL;DR: If you use text type without a constraint and have indexes on these columns, it is very possible that you hit this limit for some of your columns and get error when you try to insert data but with using varchar(n), you can prevent it.

Some more details: The problem here is that PostgreSQL doesn't give any exceptions when creating indexes for text type or varchar(n) where n is greater than 2712. However, it will give error when a record with compressed size of greater than 2712 is tried to be inserted. It means that you can insert 100.000 character of string which is composed by repetitive characters easily because it will be compressed far below 2712 but you may not be able to insert some string with 4000 characters because the compressed size is greater than 2712 bytes. Using varchar(n) where n is not too much greater than 2712, you're safe from these errors.

Tsarism answered 11/4, 2018 at 12:4 Comment(5)
Later postgres errors on trying to create indexing for text only works for varchar (version without the (n)). Only tested with embedded postgres though.Volar
Refering to : #39966334 which has a link to PostgreSQL Wiki: wiki.postgresql.org/wiki/… has max Row size as 400GB, from that it looks like the stated 2712 byte limit per row is wrong. Maximum size for a database? unlimited (32 TB databases exist) Maximum size for a table? 32 TB Maximum size for a row? 400 GB Maximum size for a field? 1 GB Maximum number of rows in a table? unlimitedBellinger
@BillWorthington The numbers you posted don't take into account of putting indexes though. 2712 byte is about btree's max limits, it's an implementation detail so that you can't find it on the documents. However, you can easily test it yourself or just google it by searching "postgresql index row size exceeds maximum 2712 for index" e.g..Tsarism
I am new to PostgeSQL, so am not the expert. I am working on a project where I want to store news articles in a column in a table. Looks like the text column type is what I will use. A total row size of 2712 bytes sounds way too low for a database that is suppose to be close to the same level as Oracle. Do I understand you correctly that you are referring to indexing a large text field? Not trying to challenge or argue with you, just trying to understand the real limits. If there are no indexes involved, then would the row limit be 400GB as in the wiki?? Thanks for your fast response.Bellinger
@BillWorthington You should research about Full Text Search. Check this link e.g.Tsarism
O
30

text and varchar have different implicit type conversions. The biggest impact that I've noticed is handling of trailing spaces. For example ...

select ' '::char = ' '::varchar, ' '::char = ' '::text, ' '::varchar = ' '::text

returns true, false, true and not true, true, true as you might expect.

Object answered 11/1, 2018 at 15:56 Comment(3)
How is this possible? If a = b and a = c then b = c.Backed
Tested, and it is indeed true. Impossible, but true. Very, very strange.Copulation
It's because the = operator is not only comparing the stuff, but it also does some conversions to find a common type for the values. It's pretty common behaviour in various languages, and the used conversions also differ between languages. For example in JavaScript you can see that [0 == '0.0', 0 == '0', '0.0' == '0'] -> [true, true, false]Flavourful
D
15

The difference is between tradition and modern.

Traditionally you were required to specify the width of each table column. If you specify too much width, expensive storage space is wasted, but if you specify too little width, some data will not fit. Then you would resize the column, and had to change a lot of connected software, fix introduced bugs, which is all very cumbersome.

Modern systems allow for unlimited string storage with dynamic storage allocation, so the incidental large string would be stored just fine without much waste of storage of small data items.

While a lot of programming languages have adopted a data type of 'string' with unlimited size, like C#, javascript, java, etc, a database like Oracle did not.

Now that PostgreSQL supports 'text', a lot of programmers are still used to VARCHAR(N), and reason like: yes, text is the same as VARCHAR, except that with VARCHAR you MAY add a limit N, so VARCHAR is more flexible.

You might as well reason:

why should we bother using the mouthful "VARCHAR WITHOUT N", now that we can simplify our life with just "TEXT"?

In my recent years with Oracle, I have used CHAR(N) or VARCHAR(N) on very few occasions. Because Oracle does (did?) not have an unlimited string type, I used for most string columns VARCHAR(2000), where 2000 was at some time the maximum for VARCHAR, and in all practical purposes not much different from 'infinite'.

Now that I am working with PostgreSQL, I see TEXT as real progress. No more emphasis on the VAR feature of the CHAR type. No more emphasis on let's use VARCHAR without N. Besides, typing TEXT saves 3 keystrokes compared to VARCHAR.

Younger colleagues would now grow up without even knowing that in the old days there were no unlimited strings. Just like that in most projects they don't have to know about assembly programming.

Update: Azure type String

Apparently, the modern system of Azure SQL has a generic text type named String, like PostgreSQL type Text, but with an unconfigurable limit of just 500 characters. In Azure, type String seems more commonly used than Varchar(N) which has a limit of 4000. Is this progress?

Duster answered 10/10, 2022 at 12:34 Comment(3)
very informative answer, thanks!Brita
@aderchox I guess you meant to comment: Varchar without N is kept for backwards...Duster
Edit: So the gist of it is, just use TEXT. VARCHAR without N is kept for backwards compatibility reasons. (Yeah I've edited it now, thanks :).)Scutt
B
8

A good explanation from http://www.sqlines.com/postgresql/datatypes/text:

The only difference between TEXT and VARCHAR(n) is that you can limit the maximum length of a VARCHAR column, for example, VARCHAR(255) does not allow inserting a string more than 255 characters long.

Both TEXT and VARCHAR have the upper limit at 1 Gb, and there is no performance difference among them (according to the PostgreSQL documentation).

Beading answered 3/2, 2020 at 5:56 Comment(0)
J
6

Somewhat OT: If you're using Rails, the standard formatting of webpages may be different. For data entry forms text boxes are scrollable, but character varying (Rails string) boxes are one-line. Show views are as long as needed.

Jacynth answered 24/3, 2017 at 2:10 Comment(0)
T
3

If you only use TEXT type you can run into issues when using AWS Database Migration Service:

Large objects (LOBs) are used but target LOB columns are not nullable

Due to their unknown and sometimes large size, large objects (LOBs) require more processing and resources than standard objects. To help with tuning migrations of systems that contain LOBs, AWS DMS offers the following options

If you are only sticking to PostgreSQL for everything probably you're fine. But if you are going to interact with your db via ODBC or external tools like DMS you should consider not using TEXT for everything.

Thury answered 17/9, 2021 at 21:56 Comment(1)
Also for ODBC: Crystal reports considers text a "memo" and won't allow any joins on it even if it's an FK. Varchar (limited or unlimited) works fineMedium
Y
1

I wasted way too much time because of using varchar instead of text for PostgreSQL arrays.

PostgreSQL Array operators do not work with string columns. Refer these links for more details: (https://github.com/rails/rails/issues/13127) and (http://adamsanderson.github.io/railsconf_2013/?full#10).

Yeaton answered 5/9, 2021 at 17:27 Comment(1)
Ran into the exact same problem...Vories
F
1

I have found another annoying difference between them, which gave me a bit of a hard time.

Although VARCHAR (without a size) and TEXT mean more or less the same thing, PostgreSQL still draws a distinction.

The string_agg() function expects either text or bytea data types, and will return the corresponding data type. That doesn’t stop you from using it with the other string data types, such as varchar and its variations.

However, when using it within a user defined function, you can get into strife. For example:

CREATE FUNCTION test(genrename VARCHAR)
RETURNS TABLE (category VARCHAR, items VARCHAR)
LANGUAGE PLPGSQL AS $$ BEGIN
    RETURN QUERY
    SELECT cat, string_agg(item, '|')
    FROM data
    GROUP BY cat
END $$;

This will result in an error, since the return type of string_agg is not the same as varchar. Changing the return table to items text fixes this.

In other words, PostgreSQL may treat them as the same, it still maintains a pedantic distinction between varchar and text.

Foeticide answered 22/4, 2023 at 7:42 Comment(0)
R
-2

character varying(n), varchar(n) - (Both the same). value will be truncated to n characters without raising an error.

character(n), char(n) - (Both the same). fixed-length and will pad with blanks till the end of the length.

text - Unlimited length.

Example:

Table test:
   a character(7)
   b varchar(7)

insert "ok    " to a
insert "ok    " to b

We get the results:

a        | (a)char_length | b     | (b)char_length
----------+----------------+-------+----------------
"ok     "| 7              | "ok"  | 2
Rb answered 14/3, 2018 at 9:47 Comment(3)
While MySQL will silently truncate the data when the value exceeds the column size, PostgreSQL will not and will raise a "value too long for type character varying(n)" error.Ashkhabad
@Ashkhabad Neither will truncate. MSSQL will throw an exception (msg 8152, level 16, state 30: String or binary data would be truncated). PostgreSQL will do the same, EXCEPT if the overflow is only spaces (then, it will truncate without raising an exception)Nettie
@JCKödel gsiems was talking about MySQL, not MSSQL.Struthious

© 2022 - 2024 — McMap. All rights reserved.