Sequelize - Adding a limit to a query with an include, fails to properly limit retrievals
Asked Answered
C

2

9

Issue Description

Adding a limit to a Sequelize Query with a SubQuery fails to limit retrievals. Multiple online resources referencing this error and no solutions. Is this a Sequelize error or user error?

What are you doing?

ThreadFolderUser.findAll({
order: [
  ['updated_at', 'DESC']
],
where: {
  user_id,
  folder_id,
  deleted,
  archived,
},
distinct: true,
offset,
limit: 10,
include: [
  {
    model: Thread,
    include: [
      { model: Email, include: [Attachment] },
    ]
  }
],

})


Associations

// ThreadFolderUser (assoc table) - Thread / Folder / User (tables)
User.hasMany(ThreadFolderUser, { foreignKey: 'user_id' })
ThreadFolderUser.belongsTo(User, { foreignKey: 'user_id' })
Folder.hasMany(ThreadFolderUser, { foreignKey: 'folder_id' })
ThreadFolderUser.belongsTo(Folder, { foreignKey: 'folder_id' })
Thread.hasMany(ThreadFolderUser, { foreignKey: 'thread_id' })
ThreadFolderUser.belongsTo(Thread, { foreignKey: 'thread_id' })

// Thread - Emails
Thread.hasMany(Email, { foreignKey: 'thread_id' })
Email.belongsTo(Thread, { foreignKey: 'thread_id' })

// Email - Attachments
Email.hasMany(Attachment, { foreignKey: 'email_id' })
Attachment.belongsTo(Email, { foreignKey: 'email_id' })

What do you expect to happen?

I expected 10 records (based on the limit currently set to 10) retrieved from the AssociationTable, since I have at least 15 records in the database that match this query.

What is actually happening?

Returns 6 in my case, instead of 10 (with the limit set to 10). Instead of pulling the first 10 matches.


Additional context

If I remove the limit, it works as intended (even with the includes).

If I remove the include, it works as intended (even with the limit).

If I copy/paste the SQL Query generated by Sequelize and insert it directly into Workbench, it retrieves the proper amount of rows.

It seems the issue is the limit combined with the include cause the query to retrieve only the records that match within the first 10 searched in the DB.


Other references to the same issue without a proper solution presented:


Environment

  • Sequelize version: v5.21.3
  • Node.js version: v12.13.1
  • Operating System: AWS Lambda Function
  • TypeScript version: 3.7.2

I'm well aware that this exact same issue has been brought up in multiple other threads and platforms -- as I have linked a few of them above -- however none of them have a direct answer, and 1 of them marked an irrelevant point as the answer which did not solve the intended issue. I'm hoping we can get an answer to this, or a realistic workaround beyond hard coding the SQL Query (last resort).

It would be unthinkable for Sequelize not to be able to handle a limit with an include in the same query, so there must be something missing / user error on my side. I've searched multiple times and certainly started with Sequelize documentation, of which does not reference this issue or a similar example, or any problems that may arise with combining a limit and include.

Many thanks for any contributions made to help solve this issue. Hopefully some @Sequelize Engineer is out there able to help answer this :)

Claver answered 12/5, 2020 at 21:17 Comment(4)
Show association definitionsKos
@Kos - thanks, added them.Claver
where is an association for AssociationModel?Kos
@Kos updated - I was trying to use generic names to keep it simple and forgot when adding associations it would need updating.Claver
K
22

You cannot properly limit a sequelize query that including hasMany association until you make sequelize to get included association objects by separate queries. In your query you have include with the association Thread.hasMany(Email so you should indicate separate: true in the Email include like this (this also goes for Attachment association):

include: [
      { 
       model: Email, 
       separate: true,
       include: [{
         model: Attachment,
         separate: true
       }] },
    ]

Also you don't need to indicate distinct: true because we already indicated to separate hasMany associations into its own queries.

Another problem with hasMany includes (especially with nested hasMany) in queries that they turn into JOIN's in a SQL query that means a DB multiplies an amount of main records to an amount of nested records and so on. For instance: 100 main records each has 100 linked records each has its own linked 100 records (Thread -> Email -> Attachment). All in all you make a DB to query 100*100*100 - 1 million records at once! It usually leads to out of memory.

About LIMIT and hasMany: a DB selects 100 main records with 100 linked for each (10000 records at once) and after that it takes first 10 records from these 10000 records (not from 100 main records). That's how SQL queries work!

Kos answered 12/5, 2020 at 22:29 Comment(8)
This is an amazing answer, and I have yet to see anything remotely close to a logical answer such as this. Thank you for providing the solution and most importantly explaining why it works this way. I am dealing with a large database, over 10 million email records (individual records in 1 table) from an old DB, of which I'll be creating a script to convert them into threads connected to relative emails and attachments. Due to this I'm hoping the JOIN statements won't be anywhere near the 100*100*100, but point well taken and I think I understand it, I'll keep an eye on this.Claver
To confirm if understand this right, if I'm pulling 20 threads, with an average of 15 emails each and an average of 5 attachments each that would be a query of 1500? Or will it actually be pulling (all matching threads) * (all matching emails on those threads) * (all matching attachments on this emails) and then limiting down to the main 20, so potentially my queries will always be pulling a lot more records before filtering down to the main 20 limit/offset - am I correct on this thinking?Claver
So my concern will be the main query's where statement to ensure its only pulling the necessary association records, in this case, the association records specific to that 1) user and 2) mailbox (of which I've named folder), so if there are 500 records matching this user/folder combination, it will always start at 500 * matching Emails to the 500 * matching Attachments to the Emails - and then pair down to the limit and offset, is correct?Claver
If you indicate separate: true then you'll get 20 threads (1 query with 20 threads only filtered and limited), then 20 separate queries to get emails then 20*5 separate queries to get attachments.Kos
Emails and attachments will be queried only for already filtered and limited threadsKos
I created a little video to help explain this concept. loom.com/share/33bea5a0111a4b54913deb51506c6622Thunderstorm
@RyanShillington The video is about many-to-many and this question is about 2 simultaneous 1:N from one table to two other tables. It still could be useful though when you use belongsToMany you can't do separate queries using separate: true option.Kos
@Kos True! The problem is easier when only using belongsTo.Thunderstorm
R
1

Adding foreign key in the attribute array should do the trick.

include: [
  {
    model: ModelName,
    attributes: [...otherAttributes, foreignKey]
   } 
]

Works fine in my case.

Rooker answered 15/1 at 15:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.