select * vs select column

R

12

138

If I just need 2/3 columns and I query SELECT * instead of providing those columns in select query, is there any performance degradation regarding more/less I/O or memory?

The network overhead might be present if I do select * without a need.

But in a select operation, does the database engine always pull atomic tuple from the disk, or does it pull only those columns requested in the select operation?

If it always pulls a tuple then I/O overhead is the same.

At the same time, there might be a memory consumption for stripping out the requested columns from the tuple, if it pulls a tuple.

So if that's the case, select someColumn will have more memory overhead than that of select *

Rectangular answered 5/7, 2010 at 14:45 Comment(8)

Is there a specific RDBMS you're asking about? It's possible that how SELECT queries are executed/processed is different from database to database. – Toadflax 5/7, 2010 at 15:7

As an aside, in PostgreSQL, if you say CREATE VIEW foo_view AS SELECT * FROM foo;, then add columns to table foo later on, those columns won't automatically show up in foo_view as expected. In other words, the * in this context only expands once (at view creation time), not per SELECT. Because of complications arising from ALTER TABLE, I would say that (in practice) * is Considered Harmful. – Octans 5/7, 2010 at 15:10

@JoeyAdams - not just PostgresQL, this is also the behaviour of Oracle. – Vivianviviana 5/7, 2010 at 15:38

possible duplicate of Best to use * when calling a lot of fields in mysql? – Phrensy 5/7, 2010 at 15:55

@OMG Ponies: I was not aware of similar post. However These arenot really similer. @Lèse majesté: I am talking about Generic RDBMS. not about any specific vendor @Joey Adams: Hmm I know that * is unsafe. just wanna discuss the performance issues regarding. – Rectangular 5/7, 2010 at 18:19

For SQL Server see sqlblog.org/blogs/aaron_bertrand/archive/2009/10/10/… – Fallfish 6/3, 2013 at 15:8

possible duplicate of Why is SELECT * considered harmful? – Fallfish 6/3, 2013 at 16:15

@Vivianviviana - not just PostreSQL and Oracle, also in Microsoft SQL – Ferland 11/6, 2014 at 17:58

E

34

It always pulls a tuple (except in cases where the table has been vertically segmented - broken up into columns pieces), so, to answer the question you asked, it doesn't matter from a performance perspective. However, for many other reasons, (below) you should always select specifically those columns you want, by name.

It always pulls a tuple, because (in every vendors RDBMS I am familiar with), the underlying on-disk storage structure for everything (including table data) is based on defined I/O Pages (in SQL Server for e.g., each Page is 8 kilobytes). And every I/O read or write is by Page.. I.e., every write or read is a complete Page of data.

Because of this underlying structural constraint, a consequence is that Each row of data in a database must always be on one and only one page. It cannot span multiple Pages of data (except for special things like blobs, where the actual blob data is stored in separate Page-chunks, and the actual table row column then only gets a pointer...). But these exceptions are just that, exceptions, and generally do not apply except in special cases ( for special types of data, or certain optimizations for special circumstances)
Even in these special cases, generally, the actual table row of data itself (which contains the pointer to the actual data for the Blob, or whatever), it must be stored on a single IO Page...

EXCEPTION. The only place where Select * is OK, is in the sub-query after an Exists or Not Exists predicate clause, as in:

   Select colA, colB
   From table1 t1
   Where Exists (Select * From Table2
                 Where column = t1.colA)

EDIT: To address @Mike Sherer comment, Yes it is true, both technically, with a bit of definition for your special case, and aesthetically. First, even when the set of columns requested are a subset of those stored in some index, the query processor must fetch every column stored in that index, not just the ones requested, for the same reasons - ALL I/O must be done in pages, and index data is stored in IO Pages just like table data. So if you define "tuple" for an index page as the set of columns stored in the index, the statement is still true.
and the statement is true aesthetically because the point is that it fetches data based on what is stored in the I/O page, not on what you ask for, and this true whether you are accessing the base table I/O Page or an index I/O Page.

For other reasons not to use Select *, see Why is SELECT * considered harmful? :

Edgeworth answered 5/7, 2010 at 14:50 Comment(9)

"It always pulls a tuple" are you sure ? Hmm Okay So I was right. if thats the case select * will have less memory overhead than select column but same I/O overhead. so If we leave network overhead. select * if less overhead than that of select column – Rectangular 5/7, 2010 at 14:56

This is NOT true. One example off the top of my head is when you want only the value of an indexed column in MySQL (for example, just to check for row existence), and you're using MyISAM storage engine, it'll grab the data from the MYI file, which could be in memory, and not even go to disk! – Wyly 5/7, 2010 at 14:57

Ya if the requested set of tuple is in memory there will be no I/O but thats special case. So What is the summery. If I select some indexed Column then entire tuple is not read ? otherwise entire tuple is read ? – Rectangular 5/7, 2010 at 15:7

I'm not exactly sure how MySql does caching, but in SQL Server, and In Oracle, even when data is in in-memory cache, it still accesses it using the same Page structre as it would when accessing it from disk. meaning that it would require one memory I/O per page of data... exactly the same as it would from disk. (except memory I/Os are much faster than Disk I/Os of course). Indeed, that's a goal of caching design, to make the access process totally independant on location of the data. – Edgeworth 5/7, 2010 at 15:11

@Charles Bretana: So If I invoke select ColumnName there will be a memory overhead of stripping out not requested Cells out of the tuple and transmitting only the requested Columns. agreed ? – Rectangular 5/7, 2010 at 15:12

@user256007, it's not that much of a special case. I'm not sure of all the cases in which it doesn't read the entire tuple, I know enough exist to always specify. You have plenty of examples already :) – Wyly 5/7, 2010 at 15:16

@user, Not necessarily, this is so insignificant as to not be a concern to DB engine design. In fact, it's more likely that all queries have to "process" each column they deliver from the Page, (Even when using Select *, so then, the fewer columns requested, the LESS the processing... But again, this is insginificant in-memory cpu processing load, so much more important are the potential maintenance and logic errors from using Select * DO NOT Use it for performance reasons.. – Edgeworth 5/7, 2010 at 15:16

In Oracle a row can span multiple blocks if it is too long to fit on a single block (known as "chaining"), or if due to an update it needs to grow more than the space available in the current block then the entire row is moved to a new block that does have sufficient space and a pointer is left in the original block to indicate where it has moved to (known as "migration", and this does not modify index entries with the new ROWID) – Cymophane 22/5, 2012 at 10:9

Can you spell out more the "for many other reasons"? Because those were not clear to me. If performance does not matter, why care about requesting column names? – Coaly 17/4, 2017 at 18:57

D

122

There are several reasons you should never (never ever) use SELECT * in production code:

since you're not giving your database any hints as to what you want, it will first need to check the table's definition in order to determine the columns on that table. That lookup will cost some time - not much in a single query - but it adds up over time
if you need only 2/3 of the columns, you're selecting 1/3 too much data which needs to be retrieving from disk and sent across the network
if you start to rely on certain aspects of the data, e.g. the order of the columns returned, you could get a nasty surprise once the table is reorganized and new columns are added (or existing ones removed)
in SQL Server (not sure about other databases), if you need a subset of columns, there's always a chance a non-clustered index might be covering that request (contain all columns needed). With a SELECT *, you're giving up on that possibility right from the get-go. In this particular case, the data would be retrieved from the index pages (if those contain all the necessary columns) and thus disk I/O and memory overhead would be much less compared to doing a SELECT *.... query.

Yes, it takes a bit more typing initially (tools like SQL Prompt for SQL Server will even help you there) - but this is really one case where there's a rule without any exception: do not ever use SELECT * in your production code. EVER.

Dashboard answered 5/7, 2010 at 14:52 Comment(11)

I am only bothered about memory and I/O overhead. I've already mentioned that select * will have more network overhead. according to your second point. you meant select operation don't pull atomic tuples. rather it pulls only the requested columns from the disks. so there will be a memory overhead in select column to check which cell's data to pull. as far I know Data is always stored on disk as tuples. not sure how select pulls it. so select * will not require a through check through Data Structure of the Table – Rectangular 5/7, 2010 at 15:1

whilst agreeing with you in practice, you are certainly correct in all cases when fetching column data from the table, as this question addresses), yr emphasis on EVER nevertheless drives me to point out that this rules is not general to ALL Sql queries... specifically, it's use in a subquery after an EXISTS predicate, (as in Where Exists (Select * From ...) the use of Select * is certainly no issue, and in some circles is considered a best practice. – Edgeworth 5/7, 2010 at 16:4

@Charles Bretana: yes, the IF EXISTS(SELECT *... is a special case - since there, no data is really retrieved, but it's just a check for existance, the SELECT * is not an issue there... – Dashboard 5/7, 2010 at 16:39

Typically if we need to consistently access specific parts of a table, we will create a view containing only the columns we need. Of course, we then do SELECT * from my_view. From a performance POV, is this just as bad as selecting all from the table? – Hervey 5/7, 2010 at 16:46

On the other hand, if you SELECT * in PostgreSQL you get a well-formed data type back you can actually do something with rather than a generic record that you can't pass to other stored procedures directly. – Aposiopesis 7/3, 2013 at 2:46

What about if I'm developing an API that makes it possible to retrieve data from one of my tables. Since I wouldn't know which data the user is interested in, I suppose SELECT * would be acceptable? – Pembrook 16/2, 2014 at 22:19

@SimonBengtsson: I would still argue against this - suppose you have some "administrative" data in specific columns in your table that you don't want to expose to the customer? I would always explicitly specify a list of columns to fetch – Dashboard 17/2, 2014 at 5:28

Thats true. What about when querying a view that was specifically setup to be used with the API? – Pembrook 17/2, 2014 at 10:28

@Dashboard What about SELECT column1 FROM (SELECT * FROM table1). Is this also considered bad practice? – Desertion 8/7, 2021 at 12:24

@Steve: yes - you should always avoid SELECT * in production / professional code situations - possibly except for the IF EXISTS (SELECT * FROM ... WHERE ....) situation - there, the * doesn't hurt, since no data is really being fetched, but only the existance of a specific row matching a WHERE clause is checked – Dashboard 8/7, 2021 at 12:25

@Dashboard In my example, no column is fetched except for column1. Then why isn't it ok? – Desertion 8/7, 2021 at 22:46

E

34

It always pulls a tuple (except in cases where the table has been vertically segmented - broken up into columns pieces), so, to answer the question you asked, it doesn't matter from a performance perspective. However, for many other reasons, (below) you should always select specifically those columns you want, by name.

It always pulls a tuple, because (in every vendors RDBMS I am familiar with), the underlying on-disk storage structure for everything (including table data) is based on defined I/O Pages (in SQL Server for e.g., each Page is 8 kilobytes). And every I/O read or write is by Page.. I.e., every write or read is a complete Page of data.

Because of this underlying structural constraint, a consequence is that Each row of data in a database must always be on one and only one page. It cannot span multiple Pages of data (except for special things like blobs, where the actual blob data is stored in separate Page-chunks, and the actual table row column then only gets a pointer...). But these exceptions are just that, exceptions, and generally do not apply except in special cases ( for special types of data, or certain optimizations for special circumstances)
Even in these special cases, generally, the actual table row of data itself (which contains the pointer to the actual data for the Blob, or whatever), it must be stored on a single IO Page...

EXCEPTION. The only place where Select * is OK, is in the sub-query after an Exists or Not Exists predicate clause, as in:

   Select colA, colB
   From table1 t1
   Where Exists (Select * From Table2
                 Where column = t1.colA)