So I want to upload large CSV files to a mongoDB cloud database using a Node.js server using Express, Mongoose and Multer's GridFS storage engine, but when the file upload starts, my database becomes unable to handle any other API requests. For example, if a different client requests to get a user from the database while the file is being uploaded, the server will recieve the request and try to fetch the user from the MongoDB cloud, but the request will get stuck because the large file upload eats up all the computational resources. As a result, the get request performed by the client will not return the user until the file upload that is in progress is completed.
I understand that if a thread is taking a long time to execute a callback (Event loop) or a task (Worker), then it is considered "blocked" and that Node.js runs JavaScript code in the Event Loop while it offers a Worker Pool to handle expensive tasks like file I/O. I've read on this blog post by NodeJs.org that in order to keep your Node.js server speedy, the work associated with each client at any given time must be "small" and that my goal should be to minimize the variation in Task times. The reasoning behing this is that if a Worker's current Task is much more expensive than other Tasks, it will be unavailable to work on other pending Tasks, thus decreasing the size of the Worker Pool by one, until the Task is completed.
In other words, the client performing the large file upload is executing an expensive Task that decreases the throughput of the Worker Pool, in turn decreasing the throughput of the server. According to the aforementioned blog post, when each sub-task completes it should submit the next sub-Task, and when the final sub-Task is done, it should notify the submitter. This way, between each sub-Task of the long Task (the large file upload), the Worker can work on a sub-Task from a shorter Task, thus solving the blocking problem.
However, I do not know how to implement this solution in actual code. Are there any specific partioned functions that can solve this problem? Do I have to use a specific upload architecture or a node package other than multer-gridfs-storage to upload my files? Please help
Here is my current file upload implementation using Multer's GridFS storage engine:
// Adjust how files get stored.
const storage = new GridFsStorage({
// The DB connection
db: globalConnection,
// The file's storage configurations.
file: (req, file) => {
...
// Return the file's data to the file property.
return fileData;
}
});
// Configure a strategy for uploading files.
const datasetUpload = multer({
// Set the storage strategy.
storage: storage,
// Set the size limits for uploading a file to 300MB.
limits: { fileSize: 1024 * 1024 * 300 },
// Set the file filter.
fileFilter: fileFilter,
});
// Upload a dataset file.
router.post('/add/dataset', async (req, res)=>{
// Begin the file upload.
datasetUpload.single('file')(req, res, function (err) {
// Get the parsed file from multer.
const file = req.file;
// Upload Success.
return res.status(200).send(file);
});
});