Why is the n+1 selects pattern slow?

Asked 7/10, 2014 at 21:14 Answered 17/10, 2014 at 21:10

I'm rather inexperienced with databases and have just read about the "n+1 selects issue". My follow-up question: Assuming the database resides on the same machine as my program, is cached in RAM and properly indexed, why is the n+1 query pattern slow?

As an example let's take the code from the accepted answer:

SELECT * FROM Cars;

/* for each car */
SELECT * FROM Wheel WHERE CarId = ?

With my mental model of the database cache, each of the SELECT * FROM Wheel WHERE CarId = ? queries should need:

1 lookup to reach the "Wheel" table (one hashmap get())
1 lookup to reach the list of k wheels with the specified CarId (another hashmap get())
k lookups to get the wheel rows for each matching wheel (k pointer dereferenciations)

Even if we multiply that by a small constant factor for an additional overhead because of the internal memory structure, it still should be unnoticeably fast. Is the interprocess communication the bottleneck?

Edit: I just found this related article via Hacker News: Following a Select Statement Through Postgres Internals. - HN discussion thread.

Edit 2: To clarify, I do assume N to be large. A non-trivial overhead will add up to a noticeable delay then, yes. I am asking why the overhead is non-trivial in the first place, for the setting described above.

Flageolet answered 7/10, 2014 at 21:14 Comment(5)

Because one statement retrieving n+1 rows is almost always faster than n+1 statements. There is an overhead involved in parsing the SQL, preparing the execution plan and then returning the data (depending on the DBMS, the parsing stage can be a substantial overhead). And besides: would you read a recipe for a cake, then go to the supermarket to buy the flour, go home read the recipe, go back, buy the eggs, go home read the recipe, go back buy the butter, ... – Estreat 7/10, 2014 at 21:29

"Because one statement retrieving n+1 rows is almost always faster than n+1 statements." - I see that, but I'm surprised it takes so much longer that people writing their programs in Ruby or Python instead of C actually care. The solutions I found sometimes look substantially more complex than the original code. Re cake recipe: But if the ingredients were just in the other room, I probably would not care too much about going twice or thrice ;). – Flageolet 7/10, 2014 at 21:46

Re query parsing overhead: Given that we already use prepared statements everywhere, isn't that something a well developed library should solve? Or does that break the common interface and we are back to IPC overhead. (@a_horse_with_no_name) – Flageolet 7/10, 2014 at 21:48

"something a well developed library should solve" - For example, Python caches compiled regular expressions by default. – Flageolet 7/10, 2014 at 22:9

The interprocess communication is slow. – Maricamarice 15/10, 2014 at 7:3

You are correct that avoiding n+1 selects is less important in the scenario you describe. If the database is on a remote machine, communication latencies of > 1ms are common, i.e. the cpu would spend millions of clock cycles waiting for the network.

If we are on the same machine, the communication delay is several orders of magnitude smaller, but synchronous communication with another process necessarily involves a context switch, which commonly costs > 0.01 ms (source), which is tens of thousands of clock cycles.

In addition, both the ORM tool and the database will have some overhead per query.

To conclude, avoiding n+1 selects is far less important if the database is local, but still matters if n is large.

Oletta answered 14/10, 2014 at 9:8 Comment(0)

Assuming the database resides on the same machine as my program

Never assume this. Thinking about special cases like this is never a good idea. It's quite likely that your data will grow, and you will need to put your database on another server. Or you will want redundancy, which involves (you guessed it) another server. Or for security, you might want not want your app server on the same box as the DB.

why is the n+1 query pattern slow?

You don't think it's slow because your mental model of performance is probably all wrong.

1) RAM is horribly slow. Your CPU is wasting around 200-400 CPU cycles each time it needs to read something something from RAM. CPUs have a lot of tricks to hide this (caches, pipelining, hyperthreading)

2) Reading from RAM is not "Random Access". It's like a hard drive: sequential reads are faster. See this article about how accessing RAM in the right order is 76.6% faster http://lwn.net/Articles/255364/ (Read the whole article if you want to know how horrifyingly complex RAM actually is.)

CPU cache

In your "N+1 query" case, the "loop" for each N includes many megabytes of code (on client and server) swapping in and out of caches on each iteration, plus context switches (which usually dumps the caches anyway).

The "1 query" case probably involves a single tight loop on the server (finding and copying each row), then a single tight loop on the client (reading each row). If those loops are small enough, they can execute 10-100x faster running from cache.

RAM sequential access

The "1 query" case will read everything from the DB to one linear buffer, send it to the client who will read it linearly. No random accesses during data transfer.

The "N+1 query" case will be allocating and de-allocating RAM N times, which (for various reasons) may not be the same physical bit of RAM.

Various other reasons

The networking subsystem only needs to read one or two TCP headers, instead of N.

Your DB only needs to parse one query instead of N.

When you throw in multi-users, the "locality/sequential access" gets even more fragmented in the N+1 case, but stays pretty good in the 1-query case.

Lots of other tricks that the CPU uses (e.g. branch prediction) work better with tight loops.

See: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/13/10533875.aspx

Rallentando answered 7/10, 2014 at 21:14 Comment(0)

Having the database on a local machine reduces the problem; however, most applications and databases will be on different machines, where each round trip takes at least a couple of milliseconds.

A database will also need a lot of locking and latching checks for each individual query. Context switches have already been mentioned by meriton. If you don't use a surrounding transaction, it also has to build implicit transactions for each query. Some query parsing overhead is still there, even with a parameterized, prepared query or one remembered by string equality (with parameters).

If the database gets filled up, query times may increase, compared to an almost empty database in the beginning.

If your database is to be used by other application, you will likely hammer it: even if your application works, others may slow down or even get an increasing number of failures, such as timeouts and deadlocks.

Also, consider having more than two levels of data. Imagine three levels: Blogs, Entries, Comments, with 100 blogs, each with 10 entries and 10 comments on each entry (for average). That's a SELECT 1+N+(NxM) situation. It will require 100 queries to retrieve the blog entries, and another 1000 to get all comments. Some more complex data, and you'll run into the 10000s or even 100000s.

Of course, bad programming may work in some cases and to some extent. If the database will always be on the same machine, nobody else uses it and the number of cars is never much more than 100, even a very sub-optimal program might be sufficient. But beware of the day any of these preconditions changes: refactoring the whole thing will take you much more time than doing it correctly in the beginning. And likely, you'll try some other workarounds first: a few more IF clauses, memory cache and the like, which help in the beginning, but mess up your code even more. In the end, you may be stuck in a "never touch a running system" position, where the system performance is becoming less and less acceptable, but refactoring is too risky and far more complex than changing correct code.

Also, a good ORM offers you ways around N+1: (N)Hibernate, for example, allows you to specify a batch-size (merging many SELECT * FROM Wheels WHERE CarId=? queries into one SELECT * FROM Wheels WHERE CarId IN (?, ?, ..., ?) ) or use a subselect (like: SELECT * FROM Wheels WHERE CarId IN (SELECT Id FROM Cars)).

The most simple option to avoid N+1 is a join, with the disadvantage that each car row is multiplied by the number of wheels, and multiple child/grandchild items likely ending up in a huge cartesian product of join results.

Flu answered 14/10, 2014 at 10:1 Comment(0)

There is still overhead, even if the database is on the same machine, cached in RAM and properly indexed. The size of this overhead will depend on what DBMS you're using, the machine it's running on, the amount of users, the configuration of the DBMS (isolation level, ...) and so on.

When retrieving N rows, you can choose to pay this cost once or N times. Even a small cost can become noticeable if N is large enough.

One day someone might want to put the database on a separate machine or to use a different dbms. This happens frequently in the business world (to be compliant with some ISO standard, to reduce costs, to change vendors, ...)

So, sometimes it's good to plan for situations where the database isn't lightning fast.

All of this depends very much on what the software is for. Avoiding the "select n+1 problem" isn't always necessary, it's just a rule of thumb, to avoid a commonly encountered pitfall.

Aminta answered 17/10, 2014 at 21:10 Comment(3)

I'm sorry, but you don't really answer my question. If there is a noticeable overhead per request, then it will add up for large n, so much is clear. I also see that for many applications you want to future proof your code to be fast enough on non-local databases. There are cases, though, where this will never be necessary, think about sqlite DBs for example. Are you implying the n+1 select pattern isn't slow in theses cases (using the right database)? – Flageolet 17/10, 2014 at 22:0

I'm trying to explain there is no black and white answer to that question. n is always going to be slower then 1 but in some cases, n might still be fast enough. It depends. An embedded SQLite database is one of those cases where n will probably stay small enough. – Aminta 17/10, 2014 at 22:48

See the edit to clarify the question. As a real life example consider mpd, a music player that has a database of the local music files which will contain thousands of entries. The multisecond delay when searching for songs on my raspberry pi is evidence that n is "large" and that the access pattern is inefficient. In a sense, the world is very black here. (Afaict the actual reason for the slow search with mpd is a missing index, though, but you get the point.) – Flageolet 18/10, 2014 at 8:32

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags