"SELECT *" vs "SELECT <one_column>" for performance
Asked Answered
A

3

23

I have person table below:

person table:

id first_name last_name age
1 John Smith 23
2 David Brown 18

Now, which is faster from 2 SQL queries below?

SELECT * FROM person; -- Select all 4 columns
SELECT first_name FROM person; -- Select only "first_name" column
Antecedent answered 2/8, 2014 at 8:32 Comment(2)
It does affect the performance, but the actual effect value varies depending on a lot of factors. Generally - you want your DBMS server to not do more job than it is required to fulfill your requirements. From the other hand - the shorter the query, the faster it is parsed by mysql.Theurgy
looks like this - https://mcmap.net/q/103265/-select-vs-select-columnPickard
M
24

The issue here isn't so much a matter of the database server, as just the network communication. By selecting all columns at once, you're telling the server to return to you, all columns at once. As for concerns over IO and all that, those are addressed nicely in the question and answer @Karamba gave in a comment: select * vs select column. But for most real-world applications (and I use "applications" in every sense), the main concern is just network traffic and how long it takes to serialize, transmit, then deserialize the data. Although really, the answer is the same either way.

So pulling back all the columns is great, if you intend to use them all, but that can be a lot of extra data transfer, particularly if you store, say, lengthy strings in your columns. In many cases, of course, the difference will be undetectable and is mostly just a matter of principle. Not all, but a significant majority.

It's really just a trade-off between your aforementioned laziness (and trust me, we all feel that way) now and how important performance really is.

That all said, if you do intend to use all the column values, you're much better off pulling them all back at once then you are filing a bunch of queries.

Think of it like doing a web search: you do your search, you find your page, and you only need one detail. You could read the entire page and know everything about the subject, or you could just jump to the part about what you're looking for and be done. The latter is a lot faster if that's all you ever want, but if you're then going to have to learn about the other aspects, you'd be way better off reading them the first time than having to do your search again and find the site to talk about it.

If you aren't sure whether you'll need the other column values in the future, then that's your call to make as the developer for which case is more likely.

It all depends on what your application is, what your data is, how you're using it, and how important performance really is to you.

Maybe answered 2/8, 2014 at 9:7 Comment(0)
B
14

Selecting a single column can have a large effect on the performance of certain queries. For example, it is more efficient for the query engine to process an index rather than look up data in the original data pages. If a covering index is available -- that is, an index that contains all the columns needed for a query -- then the query will run faster. For large tables that are too big for available memory, the use of a covering index can be a big, big win. (Think orders of magnitude improvement in performance in some cases.)

Another case when a limited number of columns is beneficial is when one or more of the columns are very large, such as a BLOB or TEXT column. These can grow in size to tens of thousands of bytes or even megabytes. Retrieving them and put a big load on the server.

There is a danger in using *, if you have prepared statements and the underlying structure of the table changes. The query itself could get out-of-date (I've had this problem on other databases, but not specifically on MySQL). The underlying change could be as simple as changing the name of a column. What would be caught as a compile time error is instead a run-time error that might be much more mysterious.

In general, the reasons given for avoiding * have more to do with network performance. In many cases, it is not going to make much difference. If you are returning 20 rows from a table where each row contains, on average 100 or 200 bytes, then then difference between selecting all the columns and a subset of the columns will be minor in most hardware environments. The vast majority of the time the spent for the query will be for compiling the query, executing it in the engine, and reading the data pages. The difference between returning 200 bytes or 2000 bytes probably won't be a big difference.

However, there are cases (such as the ones listed above) where it can make a big difference. So, avoiding * is a good habit, but using it now and then probably isn't going to bring down your system.

Bolyard answered 2/8, 2014 at 12:12 Comment(0)
C
3

At least in PostgreSQL, the performance of selecting one column is faster than selecting all columns.

In PostgreSQL, I created test table with 10 id_x columns and 10 million rows as shown below:

CREATE TABLE test AS SELECT generate_series(1, 10000000) AS id_1,
                            generate_series(1, 10000000) AS id_2,
                            generate_series(1, 10000000) AS id_3,
                            generate_series(1, 10000000) AS id_4,
                            generate_series(1, 10000000) AS id_5,
                            generate_series(1, 10000000) AS id_6,
                            generate_series(1, 10000000) AS id_7,
                            generate_series(1, 10000000) AS id_8,
                            generate_series(1, 10000000) AS id_9,
                            generate_series(1, 10000000) AS id_10;

Then, I ran 2 queries below alternately 6 times in total. *Each query runs 3 times in total:

SELECT * FROM test;
SELECT id_1 FROM test;

<Result>

SELECT * FROM test; SELECT id_1 FROM test;
1st run 13.817 seconds 2.634 seconds
2nd run 13.579 seconds 2.606 seconds
3rd run 13.341 seconds 2.611 seconds
Average 13.579 seconds 2.617 seconds

The average of SELECT * FROM test; is 13.579 seconds

The average of SELECT id_1 FROM test; is 2.617 seconds

Cyrillic answered 23/12, 2022 at 5:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.