If you wrap any long query in this
SELECT COUNT(1) FROM (
<your query here>
) xyz
You'll see that the execution time will match any database tool (dbbeaver, etc) and the mysql cli.
The complicated part:
Where the time differential is coming from is the time it takes to convert the bytes mysql returns along with a table definition to JSON. The more rows returned the more work has to be done. When we SELECT COUNT(1) the amount of data being sent/parsed/serialized is like a couple of bytes and so what you're left with is how long the query actually takes in Node.
The differential between running directly and COUNT(1) is the parse/serialize time.
Database tools normally cheat here, if your query returns 150_000 rows they'll normally only parse/serialize the first 200 rows returned and then as you scroll parse/serialize the next 200 etc. This process will not run the query again as the bytes for the next 200 are just stored in memory buffer.
It's clear after doing some testing using mysql2, prisma, and other tools that implement the MySQL protocol that that is a clear bottleneck in node in serialization (interop) compared to other languages. It's most likely this is due to poorly written .js that takes the bytes returned and serializes into JSON.
This can be confirmed by if you force MySQL to return JSONAGGR of the data and then serialize that into data in node using any of the common drivers you'll see about as 500% performance increase (at the cost of the database doing more work).
This is because the serialization process on the NodeJS side is alot simpler the algorthimn will be:
Bytes are a utf-8 json string -> convert to utf-8 string -> JSON.parse(string) [native]
This is much faster than
Bytes returned are numbers/strings/floats/etc iterate through bytes and create json object literals manually (so basically a version of bytes -> json.parse) but the json.parse part isn't native code.
As a community I think there's a serious challenge here for the NodeJS ecosystem that we need to demand faster MySQL parsing/serialization other backend languages are literally eclipsing us in performance i'm seeing differences as high as 25seconds for 150_000 rows.
Debating the use-cases for returning this large amount of data is irrelevant if Java and Rust can make it fast so can we.