How to improve or avoid find / fetch cycle in meteor's publication?
Asked Answered
N

2

0

TL;DR:

Chat is one collection. ChatMess another one that has messages refering to a Chat's _id. How do I get the last messages from a list of chats with the less computation possible ? Here, find / fetch cycle in a loop is way too heavy and long.

I have this publication that is used to return a set of cursor to the user :

  • The chats sessions he takes part in (from Chat collection)
  • The last message from each of the chat session referenced in the first cursor (from ChatMess collection)

Currently, the logic is to :

  • Get the list of chat sessions from the user profile
  • Find the Chat sessions and loop through it
  • In the loop, I findOne the last message from this chat session and store its _id in an array. In addition, I store all the other users _ids.
  • Then, I find the messages which _id match the ones in my array.

Here is my main problem :

Isn't there a way more faster way to get the last messages from each of my chat session ? With that algo, I easily reach the 8000ms of response time, which is a way too heavy computation time, as much of this time is spent to find / fetch the chat messages's _id (cf linked screen from Kadira).

    Meteor.publish("publishNewChat", function() {
    this.unblock();

    // we get a list of chat _id
    let chatIdList = _get_all_the_user_chats_ids(this.userId);

    if (!chatList)
        return ;

    // get the chat sessions objects
    let chats_cursor = Modules.both.queryGet({
                    type        : 'chat',
                    method      : 'find',
                    query       : { _id: { $in: chatIdList } },
                    projection  : { sort: { _id: 1 }, limit : 1000 }
                });

    let array_of_fetched_chats = chats_cursor.fetch();
    let chat_ids = [];

    // and here we loop through the chat documents in order to get the last message that's been attached to each of them
    array_of_fetched_chats.forEach(function(e) {
        let lastMess = Modules.both.queryGet({
                            type        : 'chatMess',
                            method      : 'findOne',
                            query       : { chatId: e._id },
                            projection  : { sort: { date: -1 } }
                        });

        if (lastMess)
            chat_ids.push(lastMess._id);
    });

    return ([
        chats_cursor,
        Modules.both.queryGet({
            type        : 'chatMess',
            method      : 'find',
            query       : { _id: { $in: chat_ids } },
            projection  : { sort: { date: -1 }, limit: 1000 }
        })
    ]);
    });

Finally, it also add latence to all my DDP request that follows. I currently use a this.unblock() to avoid that, but I'd prefer not to use it here.

FYI, I have another publish that is updated each time the client change his current active chat session : on the client, routing to a new chat add its _id in a reactive array that update my getChatMess subscription in order to get on the client the messages from every chats the user visited in this since he connected. The goal is obviously to spare the server the sending of every message from every chat session the user have visited in his life.

Unfortunately, I lack ideas to improve that algo without breaking all my chat logic :S. Have you any idea ? How would you do ?

Thanks you.

EDIT: here is a screen from kadira that clearly show the problem : enter image description here

Nucleolated answered 25/2, 2016 at 15:40 Comment(6)
While I do understand your issue perfectly there is simply way too much ground to cover here for an answer. I think this post is too broad to be a good fit for SO. Can you try to post one focused question instead of multiple ones at once, and come up with a simpler minimal reproducible example for that question? You can do that multiple times with new questions, allowing the answers to be more concise and focused and your questions will be more useful for future readers.Spoony
I understand. I just added a TL,DR, I'm going to minimize the code.Nucleolated
Great! try to also make your title more SEO juicy. The better the title, the more people will click on your post from search engines and the better visibility it gets.Spoony
May I suggest that a tl;dr belongs at the top of the question, not at the bottom. If it's truly tl;dr (and it is, IMHO), then no one will read to the bottom to find your summary.Vocalize
It should be way better and smaller now :) WIth only one questionNucleolated
Do you have an index set up for the collection's chatId field?Noreen
N
0

Here is a solution I developped :

Meteor.publish("publishNewChat", function() {
this.unblock();

let user = Modules.both.queryGet({
                type        : 'users',
                method      : 'findOne',
                query       : { _id: this.userId },
                projection  : { fields: { "profile.chat": true } }
            });

let thisUserschats = tryReach(user, "profile", "chat").value;

if (!thisUserschats)
    return ;

thisUserschats = thisUserschats.map(function(e) { return (e.chatId); });

let chats = Modules.both.queryGet({
                type        : 'chat',
                method      : 'find',
                query       : { _id: { $in: thisUserschats } },
                projection  : { sort    : { _id: 1 },
                                limit   : 1000
                              }
            });

let chatArray = chats.fetch(),
    uids = cmid = [];

let messages_id_list = [],
    i = chatArray.length;

let _parallelQuery = index => {
    Meteor.setTimeout(function () {
        let tmp = Modules.both.queryGet({
                      type      : 'chatMess',
                      method    : 'find',
                      query     : { chatId: chatArray[index]._id },
                      projection: { limit: 1, sort: { date: -1 } }
                  });

        tmp.forEach(doc => {
            messages_id_list.push((doc && doc._id) ? doc._id : null);
        });
    }, 1);
}

while (--i >= 0)
    _parallelQuery(i);

let cursors = {
    chats           : chats,
    chatMessages    : null
}

let interval = Meteor.setInterval(function () {
    if (messages_id_list.length === chatArray.length)
    {
        Meteor.clearInterval(interval);

        cursors.chatMessages = Modules.both.queryGet({
                                    type        : 'chatMess',
                                    method      : 'find',
                                    query       : { _id: { $in: messages_id_list } },
                                    projection  : { sort: { date: -1 }, limit: 1000 }
                               });

        cursors.chats.observeChanges({
            // ...
        });

        cursors.chatMessages.observeChanges({
            // ...
        });

        self.ready();

      self.onStop(() => subHandle.stop(); );
    }
}, 10);

});

I used async function with Meteor.setTimeout to parallelize the queries and save an index refering to a chat _id to look for. Then, when a query is finished, I add the last message to an array. With a Meteor.setInterval, I check the array length to know when all the queries are done. Then, as I can't return cursors anymore, I use the Meteor publication low level API to handle the publishing of the documents.

FYI : in a first attempt, I was using 'findOne' in my _parallelQueries, which divided my computation time by 2/3. But then, thanks to a friend, I tried the cursor.foreach() function, which allowed me to divide the computation time by 2 again !

In production, the benchmarks allowed me to go from a 7/8 second response time to an average response time of 1.6 second :)

Hope this will be usefull to you people ! :)

Nucleolated answered 25/2, 2016 at 21:32 Comment(0)
M
0

Have you considered using the reywood/publishComposite package? With this package you can publish related data in the same method without having to do a bunch of logic to get the correct data published.

The below code should get you started:

Meteor.publishComposite("publishNewChat", function() {
return [{
    find:function(){
        return Users.find({ _id: this.userId },{fields:{"profile.chat":1}});
    },
    children:[{
        find:function(user){ //this function is passed each user returned from the cursor above.
            return UserChats.find({userId:user._id},{fields:{blah:1,blah:1}}); //find the user chats using whatever query 
        },
        children:[
            //if there are any children of user chats that you need to publish, do so here...
            {
                find:function(userchat){
                    return Chats.find({_id:userchat.chatId})
                },
                children:[
                    {
                        find:function(chat){
                            return ChatMess.find({chatId:chat._id},{ sort: { date: -1 } });
                        },
                        children:[
                            {
                                find:function(chatMess){
                                    var uids = _.without(chatMess.participants, this.userId);
                                    return Users.find({_id:{$in:uids}});
                                }
                            }
                        ]
                    }
                ]
            }
        ]
    },
    ]
}]

This will publish the cursors for all of the documents related to each of the parent documents. It is pretty fast, I use this package on a production platform high traffic and large datasets with no problems. On the client you could then query the documents as normal to get the ones you need to display.

Something like:

Users.findOne({_id:Meteor.userId()});
UserChats.find({userId:Meteor.userId()});
etc...
Moser answered 25/2, 2016 at 16:19 Comment(3)
I used to use it yes, then I switched to the non relational model, as it doesn't seems that the reywood's package has any positive influence on the perfs. I saw somewhere that reywood was asking for a benchmark of his package, but atm, none was available. I could try it again, but seeing how the reywood package's built, I don't think it will be better. All it might do is to start sending some of the doc earlier, but probably not improving the global publication perfomance. Well, in addition, it brings this question to my mind : how should we use this.unblock with a composite publication ?Nucleolated
The only benefit of the reywood package if that the cursors you return are used by many users : in this case, it should ba able to greatly limit the number of cursors the server observe. It'd be great on a homepage, but not on a chat app where only two users are using the same observer at a time.Nucleolated
I confirm what I said : after my day of testing, I can assure you that the publishComposite package, though it can make the code clearer to some devs, do not improve the perfs. It even slightly reduce them in the best cases.Nucleolated
N
0

Here is a solution I developped :

Meteor.publish("publishNewChat", function() {
this.unblock();

let user = Modules.both.queryGet({
                type        : 'users',
                method      : 'findOne',
                query       : { _id: this.userId },
                projection  : { fields: { "profile.chat": true } }
            });

let thisUserschats = tryReach(user, "profile", "chat").value;

if (!thisUserschats)
    return ;

thisUserschats = thisUserschats.map(function(e) { return (e.chatId); });

let chats = Modules.both.queryGet({
                type        : 'chat',
                method      : 'find',
                query       : { _id: { $in: thisUserschats } },
                projection  : { sort    : { _id: 1 },
                                limit   : 1000
                              }
            });

let chatArray = chats.fetch(),
    uids = cmid = [];

let messages_id_list = [],
    i = chatArray.length;

let _parallelQuery = index => {
    Meteor.setTimeout(function () {
        let tmp = Modules.both.queryGet({
                      type      : 'chatMess',
                      method    : 'find',
                      query     : { chatId: chatArray[index]._id },
                      projection: { limit: 1, sort: { date: -1 } }
                  });

        tmp.forEach(doc => {
            messages_id_list.push((doc && doc._id) ? doc._id : null);
        });
    }, 1);
}

while (--i >= 0)
    _parallelQuery(i);

let cursors = {
    chats           : chats,
    chatMessages    : null
}

let interval = Meteor.setInterval(function () {
    if (messages_id_list.length === chatArray.length)
    {
        Meteor.clearInterval(interval);

        cursors.chatMessages = Modules.both.queryGet({
                                    type        : 'chatMess',
                                    method      : 'find',
                                    query       : { _id: { $in: messages_id_list } },
                                    projection  : { sort: { date: -1 }, limit: 1000 }
                               });

        cursors.chats.observeChanges({
            // ...
        });

        cursors.chatMessages.observeChanges({
            // ...
        });

        self.ready();

      self.onStop(() => subHandle.stop(); );
    }
}, 10);

});

I used async function with Meteor.setTimeout to parallelize the queries and save an index refering to a chat _id to look for. Then, when a query is finished, I add the last message to an array. With a Meteor.setInterval, I check the array length to know when all the queries are done. Then, as I can't return cursors anymore, I use the Meteor publication low level API to handle the publishing of the documents.

FYI : in a first attempt, I was using 'findOne' in my _parallelQueries, which divided my computation time by 2/3. But then, thanks to a friend, I tried the cursor.foreach() function, which allowed me to divide the computation time by 2 again !

In production, the benchmarks allowed me to go from a 7/8 second response time to an average response time of 1.6 second :)

Hope this will be usefull to you people ! :)

Nucleolated answered 25/2, 2016 at 21:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.