What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?
Asked Answered
E

19

2264

The "N+1 selects problem" is generally stated as a problem in Object-Relational mapping (ORM) discussions, and I understand that it has something to do with having to make a lot of database queries for something that seems simple in the object world.

Does anybody have a more detailed explanation of the problem?

Eisele answered 18/9, 2008 at 21:30 Comment(5)
There are some helpful posts talking about this issue and the possible fix. Common Application Problems and How to Fix Them: The Select N + 1 Problem, The (Silver) Bullet for the N+1 Problem, Lazy loading - eager loadingPurebred
For everyone looking for solution to this problem, I found a post describing it. #32454489Fibster
Considering answers, shouldn't this be termed as 1+N problem? As this seems to be a terminology, I'm not, specifically, asking OP.Analysis
The most concise explanation I've come across is in Yann Briançon's post: "it is a performance anti-pattern where an application spams the database with N+1 small queries instead of 1 query fetching all the data needed". He also describes how to automatically detect the issue using his com.yannbriancon:spring-hibernate-query-utils library.Cockayne
So if I'm understanding correctly; there are basically 2 ways to solve the N+1 problem: with join, or loading everything into the memory in one query (like the wheels example below)?Leroylerwick
P
1556

Let's say you have a collection of Car objects (database rows), and each Car has a collection of Wheel objects (also rows). In other words, CarWheel is a 1-to-many relationship.

Now, let's say you need to iterate through all the cars, and for each one, print out a list of the wheels. The naive O/R implementation would do the following:

SELECT * FROM Cars;

And then for each Car:

SELECT * FROM Wheel WHERE CarId = ?

In other words, you have one select for the Cars, and then N additional selects, where N is the total number of cars.

Alternatively, one could get all wheels and perform the lookups in memory:

SELECT * FROM Wheel;

This reduces the number of round-trips to the database from N+1 to 2. Most ORM tools give you several ways to prevent N+1 selects.

Reference: Java Persistence with Hibernate, chapter 13.

Portable answered 18/9, 2008 at 21:36 Comment(30)
To clarify on the "This is bad" - you could get all the wheels with 1 select (SELECT * from Wheel;), instead of N+1. With a large N, the performance hit can be very significant.Airdrop
@Airdrop I'm surprised you got so many upvotes for being wrong. A database is very good about indexes, doing the query for a specific CarID would return very fast. But if you got all the Wheels are once, you would have to search for CarID in your application, which is not indexed, this is slower. Unless you have major latency issues reaching your database going n + 1 is actually faster - and yes, I benchmarked it with a large variety of real world code.Saga
@ariel The 'correct' way is to get all the wheels, ordered by CarId (1 select), and if more details than the CarId are required, make a second query for all cars (2 queries total). Printing things out is now optimal, and no indexes or secondary storage were required (you can iterate over results, no need to download them all). You benchmarked the wrong thing. If you are still confident of your benchmarks, would you mind posting a longer comment (or a full answer) explaining your experiment and results?Airdrop
@Airdrop I benchmarked joining wheels and cars (and de-duplicating in application code), and that is indeed slower - much slower. Doing them as two queries and sorting the second query by the key of the first is much better. The other answers on this page do not say to do that however, they talk about joining. If the first query is only getting a subset of rows you will need a more complicated query for the second one to limit how much data you get so it's not necessarily the best way, although it can be, it would need to be checked for the specific situation.Saga
@Saga You say "A database is very good about indexes, doing the query for a specific CarID would return very fast." So what I would do is getting all the wheels sorted by CarID (1 Select) and then in the Java code looping through the (unique) N CarID's I got and for each new CarID issuing a Select query for a Car with that specific CarID (very fast according to you). So basically I get the total result in N+1 Select queries but very fast.Tuberculin
@Saga (continuation...)If I get you right, what you are suggesting is the naive O/R implementation in the original question. So you first get all the Cars (1 Select), then for each Car you issue a Select query on the Wheel table, so this is not so fast (since you don't simply fetch a wheel by its WheelID, you fetch a wheel by its CarID). So you get 1+N Select queries but the N Selects are quite slow. In conclusion I would say (based on your reported benchmarks) regarding the times of execution: my N+1 Selects < naive N+1 Selects < Join (Hibernate solution of 1 query)Tuberculin
@Tuberculin except the N Selects are not slow, the can be faster than a join plus sorting in a temp table. Please see my replies to cfeduke's answer for specific details about why.Saga
i think that this problem is solved by lazy initialization, right ?Goddord
"Hibernate (I'm not familiar with the other ORM frameworks) gives you several ways to handle it." and these way are?Motorbus
@ariel - It not right to say the naive approach is "correct". It would be more accurate to say given certain scenarios the naive approach can perform better. Generally speaking 1 query is better than n. See big-o notation. While O(1) is a better algorithm than O(N), there are practical situations where a constant time approach is not the fastest.Fortuna
@Lee-Slalom It might be 1 database query - but then you have to work with the data in your application, and your application is slower than the database. So your analysis of O() is too simplistic - you are only including the work from the database, and ignoring what the application has to do.Saga
@Saga Try running your benchmarks with database and application servers on separate machines. In my experience, round trips to the database cost more in overhead than the query itself. So yes, the queries are really fast, but it's the round trips that wreak havok. I've converted "WHERE Id = const" to "WHERE Id IN (const, const, ...)" and gotten orders of magnitude increases out of it.Pandiculation
@Pandiculation I said that several times in my replies. Perhaps you skimmed and missed it. If you have latency for your queries then things change. But at least for typical web-dev work, the database is usually on the same machine.Saga
There are a lot of influential factors that decide whether or not the join (one select) is faster than two or N+1 selects. I'd argue that in a proper setup, the join should be fastest and the version with N+1 selects should be slowest. But then, YMMV.Giorgi
@Saga - Or get the Db to return all wheels sorted by CarId - now you've got a sorted list which is lightning fast to access in your application and only took a single Db rountrip. There's less data overhead, less queries and the only additional work your app does is locating the first/last instance of the carId in your list. On a different topic, I don't know how big the sites you work on are, but I very rarely end up with a Db on the same (virtual) machine as the website...Deangelis
@Deangelis Are you joking? You want to return ALL wheels??? What if you only need some of them?Saga
Then... WHERE Car.Id IN (SELECT Id FROM Cars WHERE...) ? I'm not saying you should always offload operations best left to the Db to the app. I'm saying that sometimes the tradeoff is worthwhile.Deangelis
A good trade-off strategy, is to maintain a reference to the entire result-set (e.g. 1 page of results) when selecting the Cars - then do a subsequent SELECT FROM wheels w WHERE w.car_id IN (...), using a comma-separated list of all the Car IDs in the result-set. Whether you have high or low DB latency, two roundtrips is more economical than a JOIN - even if the total response-time is slightly higher than with one JOIN query, overall CPU time will be lower, and should average faster on a production system with many concurrent users. JOIN as an optimization (to save CPU) is a myth.Cirrostratus
Select Cars.*, Wheels.* from Cars inner join Wheels on Cars.CarID=Wheels.CardID order by Cars.CardID,Wheels.WheelID IS the most efficient way to retrieve this information assuming your tables are correctly indexed. If you need to limit, put a WHERE clause in it. All modern ORMs worth using query like this. If multiple SELECTS are faster than a single JOIN then there is something wrong with your database design, period.Shakhty
@Stephen Byrne: Not necessarily. With Joins you most often end up sending redundant data (see also Cartesian Product for x-Joins). If Cars.* contains lot's of data, two single queries could be faster and save resources.Commandment
@Commandment well yes if you are using .*, but that was just for illustrative purposes. In reality the ORM would explicitly select only the mapped columns...Shakhty
Sorry, I must be wrong. But, can't you join in one single query to get all Car Wheels. Where does N+1 problem arise in this case?Pung
@Motorbus super old post but seems to be no answer. So... in Nhibernate you should create a search criteria and set it to eager fetch. session.CreateCriteria.SetFetchMode("Wheels", FetchMode.Eager)). You can also do with the mappers Map(x => x.Wheel).Not.LazyLoad();Boswall
I think the only instance I would ever advocate n+1 over intelligently written queries for specific purposes would be if you are using something like ScaleArc for load balancing, connection pool management, and packet-level caching. Honestly, in most of my experience, advocates of n+1 find positive edge cases to justify simpler and less verbose UnitOfWork/Repository patterns. This is not premature optimization. But too simple will paint you into corners. If you are lazy-loading in any semblance of a tight loop (w/o intermediary tech), you're doing it wrong. End of story.Halfpint
Why couldn't you say select * from wheel where carId in (id1, id2...)Esta
@Esta that is because when you are using lazy loading for accessing child collection, the SELECT is generated by NHibernate and NHibernate would not generate a SELECT like thatFurmark
a) why is everybody talking about this as if it's two round trip queries? It's going to end up being many round trip queries, typically while the web server sits and waits on the data so that it can return to the client, b) I would contend that lazy loading from a web server is not "always faster," because you are tying up a limited resources (network connections, memory) waiting on results from the database, c) the database is only going to be on the same machine for relatively small sites. As soon as you try to scale, the database and web servers are going to be separated.Fluidize
I can't believe this is the most voted answer. Never heard of joining tables in databases?Hypogene
Is there a reason most examples on the internet use a one to many relationship to explain this problem? The problem remains the same for one to one relationships (consider 'engine' instead of 'wheel').Ted
This is a terrible explanation. How does this even make sense?Exothermic
M
344

What is the N+1 query problem

The N+1 query problem happens when the data access framework executed N additional SQL statements to fetch the same data that could have been retrieved when executing the primary SQL query.

The larger the value of N, the more queries will be executed, the larger the performance impact. And, unlike the slow query log that can help you find slow running queries, the N+1 issue won’t be spotted because each individual additional query runs sufficiently fast to not trigger the slow query log.

The problem is executing a large number of additional queries that, overall, take sufficient time to slow down response time.

Let’s consider we have the following post and post_comments database tables which form a one-to-many table relationship:

The post and post_comments tables

We are going to create the following 4 post rows:

INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 1', 1)
 
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 2', 2)
 
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 3', 3)
 
INSERT INTO post (title, id)
VALUES ('High-Performance Java Persistence - Part 4', 4)

And, we will also create 4 post_comment child records:

INSERT INTO post_comment (post_id, review, id)
VALUES (1, 'Excellent book to understand Java Persistence', 1)
 
INSERT INTO post_comment (post_id, review, id)
VALUES (2, 'Must-read for Java developers', 2)
 
INSERT INTO post_comment (post_id, review, id)
VALUES (3, 'Five Stars', 3)
 
INSERT INTO post_comment (post_id, review, id)
VALUES (4, 'A great reference book', 4)

N+1 query problem with plain SQL

If you select the post_comments using this SQL query:

List<Tuple> comments = entityManager.createNativeQuery("""
    SELECT
        pc.id AS id,
        pc.review AS review,
        pc.post_id AS postId
    FROM post_comment pc
    """, Tuple.class)
.getResultList();

And, later, you decide to fetch the associated post title for each post_comment:

for (Tuple comment : comments) {
    String review = (String) comment.get("review");
    Long postId = ((Number) comment.get("postId")).longValue();
 
    String postTitle = (String) entityManager.createNativeQuery("""
        SELECT
            p.title
        FROM post p
        WHERE p.id = :postId
        """)
    .setParameter("postId", postId)
    .getSingleResult();
 
    LOGGER.info(
        "The Post '{}' got this review '{}'",
        postTitle,
        review
    );
}

You are going to trigger the N+1 query issue because, instead of one SQL query, you executed 5 (1 + 4):

SELECT
    pc.id AS id,
    pc.review AS review,
    pc.post_id AS postId
FROM post_comment pc
 
SELECT p.title FROM post p WHERE p.id = 1
-- The Post 'High-Performance Java Persistence - Part 1' got this review
-- 'Excellent book to understand Java Persistence'
    
SELECT p.title FROM post p WHERE p.id = 2
-- The Post 'High-Performance Java Persistence - Part 2' got this review
-- 'Must-read for Java developers'
     
SELECT p.title FROM post p WHERE p.id = 3
-- The Post 'High-Performance Java Persistence - Part 3' got this review
-- 'Five Stars'
     
SELECT p.title FROM post p WHERE p.id = 4
-- The Post 'High-Performance Java Persistence - Part 4' got this review
-- 'A great reference book'

Fixing the N+1 query issue is very easy. All you need to do is extract all the data you need in the original SQL query, like this:

List<Tuple> comments = entityManager.createNativeQuery("""
    SELECT
        pc.id AS id,
        pc.review AS review,
        p.title AS postTitle
    FROM post_comment pc
    JOIN post p ON pc.post_id = p.id
    """, Tuple.class)
.getResultList();
 
for (Tuple comment : comments) {
    String review = (String) comment.get("review");
    String postTitle = (String) comment.get("postTitle");
 
    LOGGER.info(
        "The Post '{}' got this review '{}'",
        postTitle,
        review
    );
}

This time, only one SQL query is executed to fetch all the data we are further interested in using.

N+1 query problem with JPA and Hibernate

When using JPA and Hibernate, there are several ways you can trigger the N+1 query issue, so it’s very important to know how you can avoid these situations.

For the next examples, consider we are mapping the post and post_comments tables to the following entities:

Post and PostComment entities

The JPA mappings look like this:

@Entity(name = "Post")
@Table(name = "post")
public class Post {
 
    @Id
    private Long id;
 
    private String title;
 
    //Getters and setters omitted for brevity
}
 
@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {
 
    @Id
    private Long id;
 
    @ManyToOne
    private Post post;
 
    private String review;
 
    //Getters and setters omitted for brevity
}

FetchType.EAGER

Using FetchType.EAGER either implicitly or explicitly for your JPA associations is a bad idea because you are going to fetch way more data that you need. More, the FetchType.EAGER strategy is also prone to N+1 query issues.

Unfortunately, the @ManyToOne and @OneToOne associations use FetchType.EAGER by default, so if your mappings look like this:

@ManyToOne
private Post post;

You are using the FetchType.EAGER strategy, and, every time you forget to use JOIN FETCH when loading some PostComment entities with a JPQL or Criteria API query:

List<PostComment> comments = entityManager
.createQuery("""
    select pc
    from PostComment pc
    """, PostComment.class)
.getResultList();

You are going to trigger the N+1 query issue:

SELECT 
    pc.id AS id1_1_, 
    pc.post_id AS post_id3_1_, 
    pc.review AS review2_1_ 
FROM 
    post_comment pc

SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 2
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 3
SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 4

Notice the additional SELECT statements that are executed because the post association has to be fetched prior to returning the List of PostComment entities.

Unlike the default fetch plan, which you are using when calling the find method of the EntityManager, a JPQL or Criteria API query defines an explicit plan that Hibernate cannot change by injecting a JOIN FETCH automatically. So, you need to do it manually.

If you didn't need the post association at all, you are out of luck when using FetchType.EAGER because there is no way to avoid fetching it. That's why it's better to use FetchType.LAZY by default.

But, if you wanted to use post association, then you can use JOIN FETCH to avoid the N+1 query problem:

List<PostComment> comments = entityManager.createQuery("""
    select pc
    from PostComment pc
    join fetch pc.post p
    """, PostComment.class)
.getResultList();

for(PostComment comment : comments) {
    LOGGER.info(
        "The Post '{}' got this review '{}'", 
        comment.getPost().getTitle(), 
        comment.getReview()
    );
}

This time, Hibernate will execute a single SQL statement:

SELECT 
    pc.id as id1_1_0_, 
    pc.post_id as post_id3_1_0_, 
    pc.review as review2_1_0_, 
    p.id as id1_0_1_, 
    p.title as title2_0_1_ 
FROM 
    post_comment pc 
INNER JOIN 
    post p ON pc.post_id = p.id
    
-- The Post 'High-Performance Java Persistence - Part 1' got this review 
-- 'Excellent book to understand Java Persistence'

-- The Post 'High-Performance Java Persistence - Part 2' got this review 
-- 'Must-read for Java developers'

-- The Post 'High-Performance Java Persistence - Part 3' got this review 
-- 'Five Stars'

-- The Post 'High-Performance Java Persistence - Part 4' got this review 
-- 'A great reference book'

FetchType.LAZY

Even if you switch to using FetchType.LAZY explicitly for all associations, you can still bump into the N+1 issue.

This time, the post association is mapped like this:

@ManyToOne(fetch = FetchType.LAZY)
private Post post;

Now, when you fetch the PostComment entities:

List<PostComment> comments = entityManager
.createQuery("""
    select pc
    from PostComment pc
    """, PostComment.class)
.getResultList();

Hibernate will execute a single SQL statement:

SELECT 
    pc.id AS id1_1_, 
    pc.post_id AS post_id3_1_, 
    pc.review AS review2_1_ 
FROM 
    post_comment pc

But, if afterward, you are going to reference the lazy-loaded post association:

for(PostComment comment : comments) {
    LOGGER.info(
        "The Post '{}' got this review '{}'", 
        comment.getPost().getTitle(), 
        comment.getReview()
    );
}

You will get the N+1 query issue:

SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 1
-- The Post 'High-Performance Java Persistence - Part 1' got this review 
-- 'Excellent book to understand Java Persistence'

SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 2
-- The Post 'High-Performance Java Persistence - Part 2' got this review 
-- 'Must-read for Java developers'

SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 3
-- The Post 'High-Performance Java Persistence - Part 3' got this review 
-- 'Five Stars'

SELECT p.id AS id1_0_0_, p.title AS title2_0_0_ FROM post p WHERE p.id = 4
-- The Post 'High-Performance Java Persistence - Part 4' got this review 
-- 'A great reference book'

Because the post association is fetched lazily, a secondary SQL statement will be executed when accessing the lazy association in order to build the log message.

Again, the fix consists in adding a JOIN FETCH clause to the JPQL query:

List<PostComment> comments = entityManager.createQuery("""
    select pc
    from PostComment pc
    join fetch pc.post p
    """, PostComment.class)
.getResultList();

for(PostComment comment : comments) {
    LOGGER.info(
        "The Post '{}' got this review '{}'", 
        comment.getPost().getTitle(), 
        comment.getReview()
    );
}

And, just like in the FetchType.EAGER example, this JPQL query will generate a single SQL statement.

Even if you are using FetchType.LAZY and don't reference the child association of a bidirectional @OneToOne JPA relationship, you can still trigger the N+1 query issue.

How to automatically detect the N+1 query issue

If you want to automatically detect N+1 query issue in your data access layer, you can use the db-util open-source project.

First, you need to add the following Maven dependency:

<dependency>
    <groupId>com.vladmihalcea</groupId>
    <artifactId>db-util</artifactId>
    <version>${db-util.version}</version>
</dependency>

Afterward, you just have to use SQLStatementCountValidator utility to assert the underlying SQL statements that get generated:

SQLStatementCountValidator.reset();

List<PostComment> comments = entityManager.createQuery("""
    select pc
    from PostComment pc
    """, PostComment.class)
.getResultList();

SQLStatementCountValidator.assertSelectCount(1);

In case you are using FetchType.EAGER and run the above test case, you will get the following test case failure:

SELECT 
    pc.id as id1_1_, 
    pc.post_id as post_id3_1_, 
    pc.review as review2_1_ 
FROM 
    post_comment pc

SELECT p.id as id1_0_0_, p.title as title2_0_0_ FROM post p WHERE p.id = 1

SELECT p.id as id1_0_0_, p.title as title2_0_0_ FROM post p WHERE p.id = 2


-- SQLStatementCountMismatchException: Expected 1 statement(s) but recorded 3 instead!
Masonry answered 26/9, 2016 at 7:13 Comment(8)
But now you have a problem with pagination. If you have 10 cars, each car with 4 wheels and you want to paginate cars with 5 cars per page. So you basically you have SELECT cars, wheels FROM cars JOIN wheels LIMIT 0, 5. But what you get is 2 cars with 5 wheels (first car with all 4 wheels and second car with only 1 wheel), because LIMIT will limit the entire resultset, not only root clause.Tillage
Thank you for article. I will read it. By fast scroll - I saw that solution is Window Function, but they are fairly new in MariaDB - so the problem persist in older versions. :)Tillage
@VladMihalcea, I pointed out either from your article or from the post every time you refer to ManyToOne case while explaining N+1 problem. But actually people mostly interested in OneToMany case relating to N+1 issue. Could you please refer and explain OneToMany case?Tallia
@VladMicalcea could Entity Graph be used instead of join fetch?Raphaelraphaela
Thanks for the answer, it cleared things. This could have been accepted answer :)Tyrelltyrian
Very detailed explanation. I use Sequelize ORM for JavaScript and this N+1 issue is handled in it. When I use include on my select queries in Sequelize, raw SQL with joins depending on how I use the include is generated.Golding
@catch23, I think this can be helpful.Muskogean
@Tillage other way to handle paggination is to fetch only ids cars (you can limit them using WHERE just like in typical query) and then fetch by car id with all joins that are needed.Papp
S
131
SELECT 
table1.*
, table2.*
INNER JOIN table2 ON table2.SomeFkId = table1.SomeId

That gets you a result set where child rows in table2 cause duplication by returning the table1 results for each child row in table2. O/R mappers should differentiate table1 instances based on a unique key field, then use all the table2 columns to populate child instances.

SELECT table1.*

SELECT table2.* WHERE SomeFkId = #

The N+1 is where the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned.

Consider:

class House
{
    int Id { get; set; }
    string Address { get; set; }
    Person[] Inhabitants { get; set; }
}

class Person
{
    string Name { get; set; }
    int HouseId { get; set; }
}

and tables with a similar structure. A single query for the address "22 Valley St" may return:

Id Address      Name HouseId
1  22 Valley St Dave 1
1  22 Valley St John 1
1  22 Valley St Mike 1

The O/RM should fill an instance of Home with ID=1, Address="22 Valley St" and then populate the Inhabitants array with People instances for Dave, John, and Mike with just one query.

A N+1 query for the same address used above would result in:

Id Address
1  22 Valley St

with a separate query like

SELECT * FROM Person WHERE HouseId = 1

and resulting in a separate data set like

Name    HouseId
Dave    1
John    1
Mike    1

and the final result being the same as above with the single query.

The advantages to single select is that you get all the data up front which may be what you ultimately desire. The advantages to N+1 is query complexity is reduced and you can use lazy loading where the child result sets are only loaded upon first request.

Sorbose answered 18/9, 2008 at 21:43 Comment(22)
The other advantage of n + 1 is that it's faster because the database can return the results directly from an index. Doing the join and then sorting requires a temp table, which is slower. The only reason to avoid n + 1 is if you have a lot of latency talking to your database.Saga
Joining and sorting can be quite fast (because you will be joining on indexed-and-possibly-sorted fields). How big is your 'n+1'? Do you seriously believe that the n+1 problem only applies to high-latency database connections?Airdrop
@Airdrop Joining and sorting is slow because you need to sort on columns from two different tables, and MySQL at least doesn't support such an index, so it sorts it in a temporary table - this is slow. (Some database do have a composite index from two tables, that would help a lot.)Saga
@Airdrop You also end up retrieving the data from table1 over and over unnecessarily, and if you are getting a lot of data from it you waste bandwidth, and worse your temporary table is so big it has to be sorted on disk, and then doing one query is millions of times slower than n + 1. In my case I was getting so much data from table1, and table2 had many many rows, but very little data that I actually ran out of disk space on the temp table because the cartesian produce was enormous. Switching to n+1 solved the problem completely.Saga
@Airdrop In a different comment you told me to get all the data from table2 in one query. In this case table2 is extremely large, and I am only using a subset of the data, just whatever data matches what I get from table1. But for table1 I have a very large and complex where condition to get just the data I need. Replicating that where for the table2 query would be slow. I could I suppose collect all the IDs from table1 in advance, then send them using IN(), but if there are a lot of IDs I wind up going over the limit to a length of a query.Saga
@ariel - Your advice that N+1 is the "fastest" is wrong, even though your benchmarks may be correct. How is that possible? See en.wikipedia.org/wiki/Anecdotal_evidence, and also my comment in the other answer to this question.Fortuna
@Lee-Slalom It's not anecdotal - it's an analysis including everything, not just focusing on the database. I explained it all in the comments, hopefully you understood it.Saga
@Saga - I think I understood it fine :). I'm just trying to point out that your result only applies to one set of conditions. I could easily construct a counter example that showed the opposite. Does that make sense?Fortuna
@Lee-Slalom That's what I'm trying to tell you: My result applies most of the time, not just in one situation. The database is faster at sorting then your app is. If you don't do N+1 in the database, then you end up doing it in your app. Everyone always focuses on not giving the database lots of work to do, and forgets about how much work the front end app has to do. The only time you should avoid N+1 is if database queries have a lot of latency, or for any other reason where it takes a long time to do a query. Another time is if database time is precious, but you have lots of app servers.Saga
@Saga I was able to get much better performance than N+1 and SELECT * cartesian product style by using a temporary id's table https://mcmap.net/q/40553/-what-is-the-quot-n-1-selects-problem-quot-in-orm-object-relational-mapping . I agree N+1 fits most cases but some times there are better options depending on your app and the performance of your host language (Java despite what people say is very fast compared to PHP).Deodar
@AdamGent Yes, this can work. One nice thing about it is you can index the temporary table, pay the time for the index only once then have fast queries. The in clause is not itself indexed (and you can't always index the column it compares). Indexing the "other side" of the clause can sometimes be fast.Saga
@Saga I completely agree with you. The "front-end" work that your app must do to sort data. I think that you should try to limit database queries because IMO your app and your database should be clustered and could (and most of the time should) not exist on the same machine. Subsequently this means every query is a network call (TCP set up tear down etc.). So IMHO you should offload the work to the database in one call, tell it how you want it sorted. There's no black magic going on in databases, they still must do page replacements and process the data the same way your app would.Rink
There's more "black magic" going on in the database than many people tend to give it credit for. Database servers are powerful systems, designed and labored over by engineers who lose sleep over optimizing minute performance issues that you'll never have the time and probably never have the inclination to even begin to address in your middle tier app code. Much of the talk about trying to take load off the database server is silly. Of all the code stacks involved in your app, the database server is the one truly designed to perform and scale.Fluidize
@Saga In practice I have found that this advice is actually bad advice. I would strongly recommend that you take a look at the execution plans for some of these 'join queries'. Databases handle joins very well, and if you're using foreign key relationships correctly, the join will execute very quickly. Much more quickly than the sum of the individual N + 1 queries. Repeatedly queuing connections, running queries, and serializing results will be more expensive than a join, and in the case of large values of N (like 1000), we're talking orders of magnitude more expensive.Molina
I really want to emphasize that for N values of more than a handful, you're really going to hurt performance with N + 1. Even with a huge InnoDB Buffer Pool (think hundreds of gigs), repeated trips to the database time and time again prove to be huge bottlenecks in many systems. The times when N + 1 makes a lot of sense is when you've got a large number of entities/children that would otherwise be included in your default fetch (join), but really only need to fetch a couple of them. If you're going to fetch all of them, the N + 1 will kill your performance. Think of N + 1 as lazy loading.Molina
@Molina The point of all these comments, is that unlike the common dogma "Never N+1", actually sometimes N+1 is faster! Yes, sometimes it's slower. It depends mainly on how much data you are getting that is from the parent table (i.e. repeats multiple times in the returned data). You also MUST remember that the frontend has to "unpack" the data, and it has to do it N times anyway! So you have to compare doing a single database query, and a ton of frontend work, vs multiple database queries, but little frontend work. Don't only look at the database time, also check the frontend time.Saga
@Saga the front end will do the same work either way. You're not saving the front end anything by doing 601 small queries instead of 1 big query. and generally the cost for the 'front end' (you should be more specific) to serialize is going to be a fraction of the cost to continually run many queries on the database. The time when N + 1 is faster is only when you're not fetching all of the children. If you have 600 children and you're fetching 3, select fetch is the way to go. If you are fetching all 600, join fetch will beat select fetch by light yearsMolina
@Molina That simply isn't true. Say you need to show 10 fields about the parent, and then 2 subfields repeated 10 times. If you have 10 parents that's 100 rows. 900 of those fields are identical repeats (the 10 parent fields). So that's one performance loss. Then your front end need to loop over all the rows, and check "is this the same parent as before or not?" In contrast you can do 11 queries - 1 for all the parents, then 10 for each of the children. Then there is frontend sorting, all that data might end up in a hash table, which is even more work for the frontend to deal with.Saga
@Molina The duplication gets worse if you have sub-sub-children. Database sorting is an even bigger problem. When you have sub-children (i.e. a join) you can only have 1 of the columns indexed for sorting, the rest are sorted the slow way. I one time tried to write the query properly - 1 query, get everything. I ended with with a 50GB!!! temporary table as the database struggled to sort all that data. Then I changed it to N+1 and it was instant, despite issuing a ton of queries, and that's because each query was quick and could read the data directly in order using an indexed sort column.Saga
@Saga again, look at execution plans. Your database was probably missing indexes if a join took forever but many individual selects was fast. That has nothing to do with the ORM and everything to do with bad database design. Also, again, the front end serialization will be vastly cheaper than waiting for a pooled connection, going over the network, and having the db execute a query hundreds of times instead of once. And in the real world, front end servers scale horizontally and rdbms servers don't, and I can buy 20 beefy front end servers for the cost of my database.Molina
To reiterate, the SELECT N + 1 problem is, at its core: I have 600 records to retrieve. Is it faster to get all 600 of them in one query, or 1 at a time in 600 queries. Unless you're on MyISAM and/or you have a poorly normalized/poorly indexed schema (in which case the ORM isn't the problem), a properly tuned db will return the 600 rows in 2 ms, while returning the individual rows in about 1 ms each. So we often see N + 1 taking hundreds of milliseconds where a join takes only a coupleMolina
For those who are interested: classes are written in C#Cockayne
E
69

Supplier with a one-to-many relationship with Product. One Supplier has (supplies) many Products.

***** Table: Supplier *****
+-----+-------------------+
| ID  |       NAME        |
+-----+-------------------+
|  1  |  Supplier Name 1  |
|  2  |  Supplier Name 2  |
|  3  |  Supplier Name 3  |
|  4  |  Supplier Name 4  |
+-----+-------------------+

***** Table: Product *****
+-----+-----------+--------------------+-------+------------+
| ID  |   NAME    |     DESCRIPTION    | PRICE | SUPPLIERID |
+-----+-----------+--------------------+-------+------------+
|1    | Product 1 | Name for Product 1 |  2.0  |     1      |
|2    | Product 2 | Name for Product 2 | 22.0  |     1      |
|3    | Product 3 | Name for Product 3 | 30.0  |     2      |
|4    | Product 4 | Name for Product 4 |  7.0  |     3      |
+-----+-----------+--------------------+-------+------------+

Factors:

  • Lazy mode for Supplier set to “true” (default)

  • Fetch mode used for querying on Product is Select

  • Fetch mode (default): Supplier information is accessed

  • Caching does not play a role for the first time the

  • Supplier is accessed

Fetch mode is Select Fetch (default)

// It takes Select fetch mode as a default
Query query = session.createQuery( "from Product p");
List list = query.list();
// Supplier is being accessed
displayProductsListWithSupplierName(results);

select ... various field names ... from PRODUCT
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?

Result:

  • 1 select statement for Product
  • N select statements for Supplier

This is N+1 select problem!

Este answered 1/12, 2009 at 13:35 Comment(3)
Is it supposed to be 1 select for the Supplier then N selects for the Product?Pothead
@bencampbell_ Yeah, initially I felt the same. But then with his example, it is One product to many suppliers.Horripilate
So basically, it's 1 query for the child and N query for the parent. Please correct me if I am wrong!Gauffer
S
48

I can't comment directly on other answers, because I don't have enough reputation. But it's worth noting that the problem essentially only arises because, historically, a lot of dbms have been quite poor when it comes to handling joins (MySQL being a particularly noteworthy example). So n+1 has, often, been notably faster than a join. And then there are ways to improve on n+1 but still without needing a join, which is what the original problem relates to.

However, MySQL is now a lot better than it used to be when it comes to joins. When I first learned MySQL, I used joins a lot. Then I discovered how slow they are, and switched to n+1 in the code instead. But, recently, I've been moving back to joins, because MySQL is now a heck of a lot better at handling them than it was when I first started using it.

These days, a simple join on a properly indexed set of tables is rarely a problem, in performance terms. And if it does give a performance hit, then the use of index hints often solves them.

This is discussed here by one of the MySQL development team:

http://jorgenloland.blogspot.co.uk/2013/02/dbt-3-q3-6-x-performance-in-mysql-5610.html

So the summary is: If you've been avoiding joins in the past because of MySQL's abysmal performance with them, then try again on the latest versions. You'll probably be pleasantly surprised.

Shelba answered 8/1, 2014 at 12:49 Comment(6)
Calling early versions of MySQL a relational DBMS is quite a stretch... If people encountering those problems had been using a real database, they would not have encountered those kinds of problems. ;-)Fluidize
Interestingly, many of these types of problems were solved in MySQL with the introduction and subsequent optimization of the INNODB engine, but you'll still run into people trying to promote MYISAM because they think it's faster.Fluidize
FYI, one of the 3 common JOIN algorithms used in RDBMS' is called nested loops. It fundamentally is an N+1 select under the hood. The only difference is the DB made an intelligent choice to use it based off statistics and indexes, rather than client code forcing it down that path categorically.Sheela
@Sheela Yes! Much like JOIN hints and INDEX hints, forcing a certain execution path in all cases will rarely beat the database. The database is almost always very, very good at choosing the optimal approach to get the data. Maybe in the early days of dbs you needed to 'phrase' your question in a peculiar way to coax the db along, but after decades of world class engineering, you can now get the best performance by asking your database a relational question and letting it sort out how to fetch and assemble that data for you.Molina
Not only is the database utilizing indexes and statistics, all of the operations are also local I/O, much of which is often operating against highly efficient cache rather than disk. The database programmers devote an awful lot of attention to optimizing these sorts of things.Fluidize
Also N+1 at the database level does not incur all the overhead that an N+1 at the application server incurs, mainly the N additional network round-trips to the DB.Availability
O
42

A good explanation of the problem can be found in the Phabricator documentation

TL;DR

It is much faster to issue 1 query which returns 100 results than to issue 100 queries which each return 1 result.

Load all your data before iterating through it.

More in-detail

The N+1 query problem is a common performance antipattern. It looks like this:

$cats = load_cats();
foreach ($cats as $cat) {
   $cats_hats => load_hats_for_cat($cat);
   // ...
}

Assuming load_cats() has an implementation that boils down to:

SELECT * FROM cat WHERE ...

..and load_hats_for_cat($cat) has an implementation something like this:

SELECT * FROM hat WHERE catID = ...

..you will issue "N+1" queries when the code executes, where N is the number of cats:

SELECT * FROM cat WHERE ...
SELECT * FROM hat WHERE catID = 1
SELECT * FROM hat WHERE catID = 2
SELECT * FROM hat WHERE catID = 3
SELECT * FROM hat WHERE catID = 4
...
Orthodoxy answered 7/11, 2014 at 10:30 Comment(1)
Best explaining answer in short.Rivy
J
33

We moved away from the ORM in Django because of this problem. Basically, if you try and do

for p in person:
    print p.car.colour

The ORM will happily return all people (typically as instances of a Person object), but then it will need to query the car table for each Person.

A simple and very effective approach to this is something I call "fanfolding", which avoids the nonsensical idea that query results from a relational database should map back to the original tables from which the query is composed.

Step 1: Wide select

  select * from people_car_colour; # this is a view or sql function

This will return something like

  p.id | p.name | p.telno | car.id | car.type | car.colour
  -----+--------+---------+--------+----------+-----------
  2    | jones  | 2145    | 77     | ford     | red
  2    | jones  | 2145    | 1012   | toyota   | blue
  16   | ashby  | 124     | 99     | bmw      | yellow

Step 2: Objectify

Suck the results into a generic object creator with an argument to split after the third item. This means that "jones" object won't be made more than once.

Step 3: Render

for p in people:
    print p.car.colour # no more car queries

See this web page for an implementation of fanfolding for python.

Jacquerie answered 9/6, 2011 at 21:18 Comment(4)
i'm so glad i stumbled on your post, because i thought i was going crazy. when i found out about the N+1 problem, my immediate thought was- well, why don't you just create a view that contains all the information you need, and pull from that view? you have validated my position. thank you sir.Underwaist
We moved away from the ORM in Django because of this problem. Huh? Django has select_related, which is meant to solve this - in fact, its docs start with an example similar to your p.car.colour example.Prerecord
This is an old anwswer, we have select_related() and prefetch_related() in Django now.Sugary
Cool. But select_related() and friend don't seem to do any of the obviously useful extrapolations of a join such as LEFT OUTER JOIN. The problem isn't an interface problem, but an issue to do with the strange idea that objects and relational data are mappable....in my view.Jacquerie
L
25

Suppose you have COMPANY and EMPLOYEE. COMPANY has many EMPLOYEES (i.e. EMPLOYEE has a field COMPANY_ID).

In some O/R configurations, when you have a mapped Company object and go to access its Employee objects, the O/R tool will do one select for every employee, wheras if you were just doing things in straight SQL, you could select * from employees where company_id = XX. Thus N (# of employees) plus 1 (company)

This is how the initial versions of EJB Entity Beans worked. I believe things like Hibernate have done away with this, but I'm not too sure. Most tools usually include info as to their strategy for mapping.

Leporid answered 18/9, 2008 at 21:33 Comment(0)
S
23

Here's a good description of the problem

Now that you understand the problem it can typically be avoided by doing a join fetch in your query. This basically forces the fetch of the lazy loaded object so the data is retrieved in one query instead of n+1 queries. Hope this helps.

Sunsunbaked answered 18/9, 2008 at 21:43 Comment(0)
P
19

Check Ayende post on the topic: Combating the Select N + 1 Problem In NHibernate.

Basically, when using an ORM like NHibernate or EntityFramework, if you have a one-to-many (master-detail) relationship, and want to list all the details per each master record, you have to make N + 1 query calls to the database, "N" being the number of master records: 1 query to get all the master records, and N queries, one per master record, to get all the details per master record.

More database query calls → more latency time → decreased application/database performance.

However, ORMs have options to avoid this problem, mainly using JOINs.

Procaine answered 5/6, 2012 at 22:21 Comment(1)
joins are not a good solution (often), because they may result in a cartesian product, meaning the number of result rows is the number of root table results multiplied with the number of results in each child table. particularly bad over multiple herarchy levels. Selecting 20 "blogs" with 100 "posts" on each and 10 "comments" on each post will result in 20000 result rows. NHibernate has workarounds, like the "batch-size" (select children with in clause on parent ids) or "subselect".Favata
R
14

In my opinion the article written in Hibernate Pitfall: Why Relationships Should Be Lazy is exactly opposite of real N+1 issue is.

If you need correct explanation please refer Hibernate - Chapter 19: Improving Performance - Fetching Strategies

Select fetching (the default) is extremely vulnerable to N+1 selects problems, so we might want to enable join fetching

Rheims answered 21/7, 2010 at 11:55 Comment(2)
i read the hibernate page. It doesn't say what the N+1 selects problem actually is. But it says you can use joins to fix it.Sang
batch-size is required for select fetching, to select child objects for multiple parents in one select statement. Subselect could be another alternative. Joins can get really bad if you have multiple hierarchy levels and a cartesian product is created.Favata
C
11

The supplied link has a very simply example of the n + 1 problem. If you apply it to Hibernate it's basically talking about the same thing. When you query for an object, the entity is loaded but any associations (unless configured otherwise) will be lazy loaded. Hence one query for the root objects and another query to load the associations for each of these. 100 objects returned means one initial query and then 100 additional queries to get the association for each, n + 1.

http://pramatr.com/2009/02/05/sql-n-1-selects-explained/

Costrel answered 20/2, 2009 at 8:33 Comment(0)
S
10

N+1 select issue is a pain, and it makes sense to detect such cases in unit tests. I have developed a small library for verifying the number of queries executed by a given test method or just an arbitrary block of code - JDBC Sniffer

Just add a special JUnit rule to your test class and place annotation with expected number of queries on your test methods:

@Rule
public final QueryCounter queryCounter = new QueryCounter();

@Expectation(atMost = 3)
@Test
public void testInvokingDatabase() {
    // your JDBC or JPA code
}
Skylark answered 15/4, 2015 at 7:52 Comment(0)
D
7

The issue as others have stated more elegantly is that you either have a Cartesian product of the OneToMany columns or you're doing N+1 Selects. Either possible gigantic resultset or chatty with the database, respectively.

I'm surprised this isn't mentioned but this how I have gotten around this issue... I make a semi-temporary ids table. I also do this when you have the IN () clause limitation.

This doesn't work for all cases (probably not even a majority) but it works particularly well if you have a lot of child objects such that the Cartesian product will get out of hand (ie lots of OneToMany columns the number of results will be a multiplication of the columns) and its more of a batch like job.

First you insert your parent object ids as batch into an ids table. This batch_id is something we generate in our app and hold onto.

INSERT INTO temp_ids 
    (product_id, batch_id)
    (SELECT p.product_id, ? 
    FROM product p ORDER BY p.product_id
    LIMIT ? OFFSET ?);

Now for each OneToMany column you just do a SELECT on the ids table INNER JOINing the child table with a WHERE batch_id= (or vice versa). You just want to make sure you order by the id column as it will make merging result columns easier (otherwise you will need a HashMap/Table for the entire result set which may not be that bad).

Then you just periodically clean the ids table.

This also works particularly well if the user selects say 100 or so distinct items for some sort of bulk processing. Put the 100 distinct ids in the temporary table.

Now the number of queries you are doing is by the number of OneToMany columns.

Deodar answered 17/10, 2012 at 4:48 Comment(0)
W
5

Without going into tech stack implementation details, architecturally speaking there are at least two solutions to N + 1 Problem:

  • Have Only 1 - big query - with Joins. This makes a lot of information be transported from the database to the application layer, especially if there are multiple child records. The typical result of a database is a set of rows, not graph of objects (there are solutions to that with different DB systems)
  • Have Two(or more for more children needed to be joined) Queries - 1 for the parent and after you have them - query by IDs the children and map them. This will minimize data transfer between the DB and APP layers.
Wormhole answered 8/1, 2021 at 12:3 Comment(1)
A brief and clear answer at last.Normannormand
P
3

A generalisation of N+1

The N+1 problem is an ORM specific name of a problem where you move loops that could be reasonably executed on a server to the client. The generic problem isn't specific to ORMs, you can have it with any remote API. In this article, I've shown how JDBC roundtrips are very costly, if you're calling an API N times instead of only 1 time. The difference in the example is whether you're calling the Oracle PL/SQL procedure:

  • dbms_output.get_lines (call it once, receive N items)
  • dbms_output.get_line (call it N times, receive 1 item each time)

They're logically equivalent, but due to the latency between server and client, you're adding N latency waits to your loop, instead of waiting only once.

The ORM case

In fact, the ORM-y N+1 problem isn't even ORM specific either, you can achieve it by running your own queries manually as well, e.g. when you do something like this in PL/SQL:

-- This loop is executed once
for parent in (select * from parent) loop

  -- This loop is executed N times
  for child in (select * from child where parent_id = parent.id) loop
    ...
  end loop;
end loop;

It would be much better to implement this using a join (in this case):

for rec in (
  select *
  from parent p
  join child c on c.parent_id = p.id
)
loop
  ...
end loop;

Now, the loop is executed only once, and the logic of the loop has been moved from the client (PL/SQL) to the server (SQL), which can even optimise it differently, e.g. by running a hash join (O(N)) rather than a nested loop join (O(N log N) with index)

Auto-detecting N+1 problems

If you're using JDBC, you could use jOOQ as a JDBC proxy behind the scenes to auto-detect your N+1 problems. jOOQ's parser normalises your SQL queries and caches data about consecutive executions of parent and child queries. This even works if your queries aren't exactly the same, but semantically equivalent.

Pelican answered 15/2, 2022 at 8:36 Comment(0)
D
2

Take Matt Solnit example, imagine that you define an association between Car and Wheels as LAZY and you need some Wheels fields. This means that after the first select, hibernate is going to do "Select * from Wheels where car_id = :id" FOR EACH Car.

This makes the first select and more 1 select by each N car, that's why it's called n+1 problem.

To avoid this, make the association fetch as eager, so that hibernate loads data with a join.

But attention, if many times you don't access associated Wheels, it's better to keep it LAZY or change fetch type with Criteria.

Downrange answered 12/7, 2013 at 17:58 Comment(1)
Again, joins are not a good solution, especially when more than 2 hierarchy levels may be loaded. Check "subselect" or "batch-size" instead; the last will load children by parent IDs in "in" clause, such as "select ... from wheels where car_id in (1,3,4,6,7,8,11,13)".Favata
G
2

ORM "N plus one" Problem

The "N plus one" problem is a common performance issue that can occur when using Object-Relational Mapping (ORM) frameworks. ORM frameworks are tools used to map database tables to objects in object-oriented programming languages. This problem arises when retrieving data from a relational database using ORM in a specific way.

To understand the "N plus one" problem, let's consider an example scenario where you have two tables: Customer and Order. Each customer can have multiple orders, and there is a one-to-many relationship between the Customer and Order tables. In ORM, you define these relationships using object-oriented concepts such as classes and references.

Now, let's say you want to retrieve all the customers along with their orders. In ORM, you might use a query like this:

customers = Customer.objects.all()

for customer in customers:
    orders = customer.orders.all()
    # Do something with the orders

In this code, you first retrieve all the customers using Customer.objects.all(). Then, for each customer, you retrieve their orders using customer.orders.all().

The issue with this approach is that it results in multiple queries being executed to the database. For example, if you have 100 customers, this code will execute 101 queries: one to retrieve all the customers and 100 more to retrieve the orders for each customer (hence the name "N plus one" problem). This can significantly impact performance, especially when dealing with large datasets.

The "N plus one" problem arises because the ORM framework performs a separate query for each customer's orders instead of fetching all the necessary data in a single query. This behavior is often the default in ORM frameworks to avoid unnecessarily loading all the associated data, which can be a performance concern in other scenarios.

To mitigate the "N plus one" problem, ORM frameworks usually provide ways to optimize data retrieval, such as eager loading or explicit joins. Eager loading allows you to fetch the required data in a single query, reducing the number of database round-trips. By specifying the relationships you want to include, the ORM framework can generate a more efficient query that retrieves all the necessary data at once.

As a demonstration of the "N plus one" problem and its solution, the following shows the actual SQL emitted from an ORM using SQLAlchemy.

Original ORM query with the N plus one problem (1 query for customers and N for each customer's order):

with Session(engine) as session:
    customers = session.scalars(select(Customer))
    for customer in customers:
        print(f"> Customer: #{customer.customer_id}")
        for order in customer.orders:
            print(f">   order #{order.order_id} at {order.order_datetime}")
-- This query gets all customers:
SELECT customer.customer_id, ...
FROM customer

-- The following SQL is executed once for each customer:
SELECT "order".order_id AS order_order_id, ...
FROM "order"
WHERE "order".customer_id = %(param_1)s

After specifying eager loading (with selectinload()), only 2 queries are required:

with Session(engine) as session:
    customers = session.scalars(
        select(Customer).options(selectinload(Customer.orders)))
    for customer in customers:
        print(f"> Customer: #{customer.customer_id}")
        for order in customer.orders:
            print(f">   order #{order.order_id} at {order.order_datetime}")
SELECT customer.customer_id, ...
FROM customer

-- This loads all the orders you need in one query:
SELECT "order".order_id AS order_order_id, ...
FROM "order"
WHERE "order".customer_id IN (%(primary_keys_1)s, %(primary_keys_2)s, ...)

Or, explicitly join and query the required fields (only 1 query is required):

with Session(engine) as session:
    stmt = (
        select(
            Customer.customer_id,
            Order.order_id,
            Order.order_datetime,
        )
        .select_from(Customer)
        .join(Customer.orders)
        .order_by(Customer.customer_id)
    )
    results = session.execute(stmt)

    current_customer_id = None
    for row in results:
        customer_id = row.customer_id
        if current_customer_id != customer_id:
            current_customer_id = customer_id
            print(f"> Customer: #{current_customer_id}")
        print(f">   order #{row.order_id} at {row.order_datetime}")
SELECT customer.customer_id, "order".order_id, ...
FROM customer
JOIN "order" ON customer.customer_id = "order".customer_id
ORDER BY customer.customer_id

In summary, the "N plus one" problem in ORM occurs when the framework executes multiple queries to retrieve associated data for each item in a collection, resulting in a significant performance overhead. Understanding and addressing this problem by optimizing data retrieval strategies can help improve the efficiency of ORM-based applications.

Grisham answered 22/6, 2023 at 3:34 Comment(0)
A
1

N+1 SELECT problem is really hard to spot, especially in projects with large domain, to the moment when it starts degrading the performance. Even if the problem is fixed i.e. by adding eager loading, a further development may break the solution and/or introduce N+1 SELECT problem again in other places.

I've created open source library jplusone to address those problems in JPA based Spring Boot Java applications. The library provides two major features:

  1. Generates reports correlating SQL statements with executions of JPA operations which triggered them and places in source code of your application which were involved in it
2020-10-22 18:41:43.236 DEBUG 14913 --- [           main] c.a.j.core.report.ReportGenerator        :
    ROOT
        com.adgadev.jplusone.test.domain.bookshop.BookshopControllerTest.shouldGetBookDetailsLazily(BookshopControllerTest.java:65)
        com.adgadev.jplusone.test.domain.bookshop.BookshopController.getSampleBookUsingLazyLoading(BookshopController.java:31)
        com.adgadev.jplusone.test.domain.bookshop.BookshopService.getSampleBookDetailsUsingLazyLoading [PROXY]
            SESSION BOUNDARY
                OPERATION [IMPLICIT]
                    com.adgadev.jplusone.test.domain.bookshop.BookshopService.getSampleBookDetailsUsingLazyLoading(BookshopService.java:35)
                    com.adgadev.jplusone.test.domain.bookshop.Author.getName [PROXY]
                    com.adgadev.jplusone.test.domain.bookshop.Author [FETCHING ENTITY]
                        STATEMENT [READ]
                            select [...] from
                                author author0_
                                left outer join genre genre1_ on author0_.genre_id=genre1_.id
                            where
                                author0_.id=1
                OPERATION [IMPLICIT]
                    com.adgadev.jplusone.test.domain.bookshop.BookshopService.getSampleBookDetailsUsingLazyLoading(BookshopService.java:36)
                    com.adgadev.jplusone.test.domain.bookshop.Author.countWrittenBooks(Author.java:53)
                    com.adgadev.jplusone.test.domain.bookshop.Author.books [FETCHING COLLECTION]
                        STATEMENT [READ]
                            select [...] from
                                book books0_
                            where
                                books0_.author_id=1
  1. Provides API which allows to write tests checking how effectively your application is using JPA (i.e. assert amount of lazy loading operations )
@SpringBootTest
class LazyLoadingTest {

    @Autowired
    private JPlusOneAssertionContext assertionContext;

    @Autowired
    private SampleService sampleService;

    @Test
    public void shouldBusinessCheckOperationAgainstJPlusOneAssertionRule() {
        JPlusOneAssertionRule rule = JPlusOneAssertionRule
                .within().lastSession()
                .shouldBe().noImplicitOperations().exceptAnyOf(exclusions -> exclusions
                        .loadingEntity(Author.class).times(atMost(2))
                        .loadingCollection(Author.class, "books")
                );

        // trigger business operation which you wish to be asserted against the rule,
        // i.e. calling a service or sending request to your API controller
        sampleService.executeBusinessOperation();

        rule.check(assertionContext);
    }
}
Anility answered 22/10, 2020 at 17:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.