Meteor app deployed to Digital Ocean stuck at 100% CPU and OOM
Asked Answered
G

3

13

I have a Meteor (0.8.0) app deployed using Meteor Up to Digital Ocean that's been stuck at 100% CPU, only to crash with out of memory, and start up again at 100% CPU. It's been stuck like this for the past 24 hours. The weird part is nobody is using the server and meteor.log isn't showing much clues. I've got MongoHQ with oplog for the database.

Digital Ocean specs:

1GB Ram 30GB SSD Disk New York 2 Ubuntu 12.04.3 x64

Screenshot showing issue:

enter image description here

Note that the screenshot was captured yesterday and it has stayed pegged at 100% cpu until it crashes with out of memory. The log shows:

FATAL ERROR: Evacuation Allocation failed - process out of memory error: Forever detected script was killed by signal: SIGABRT error: Forever restarting script for 5 time

Top displays:

26308 meteorus 20 0 1573m 644m 4200 R 98.1 64.7 32:45.36 node

How it started: I have an app that takes in a list of emails via csv or mailchimp oauth, sends them off to fullcontact via their batch process call http://www.fullcontact.com/developer/docs/batch/ and then updates the Meteor collections accordingly depending on the response status. A snippet from a 200 response

if (result.statusCode === 200) {
            var data = JSON.parse(result.content);
            var rate_limit = result.headers['x-rate-limit-limit'];
            var rate_limit_remaining = result.headers['x-rate-limit-remaining'];
            var rate_limit_reset = result.headers['x-rate-limit-reset'];
            console.log(rate_limit);
            console.log(rate_limit_remaining);
            console.log(rate_limit_reset);
            _.each(data.responses, function(resp, key) {
                var email = key.split('=')[1];
                if (resp.status === 200) {
                    var sel = {
                        email: email,
                        listId: listId
                    };
                    Profiles.upsert({
                        email: email,
                        listId: listId
                    }, {
                        $set: sel
                    }, function(err, result) {
                        if (!err) {
                            console.log("Upsert ", result);
                            fullContactSave(resp, email, listId, Meteor.userId());                            
                        }
                    });
                    RawCsv.update({
                        email: email,
                        listId: listId
                    }, {
                        $set: {
                            processed: true,
                            status: 200,
                            updated_at: new Date().getTime()
                        }
                    }, {
                        multi: true
                    });
                }
                });
                }

Locally on my wimpy Windows laptop running Vagrant, I have no performance issues whatsoever processing hundreds of thousands of emails at a time. But on Digital Ocean, it can't even handle 15,000 it seems (I've seen the CPU spike to 100% and then crash with OOM, but after it comes up it usually stabalizes... not this time). What worries me is that the server hasn't recovered at all despite no/little activity on the app. I've verified this by looking at analytics - GA shows 9 sessions total over the 24 hours doing little more than hitting / and bouncing, MixPanel shows only 1 logged in user (me) in the same timeframe. And the only thing I've done since the initial failure is check the facts package, which shows:

mongo-livedata observe-multiplexers 13 observe-drivers-oplog 13

oplog-watchers 16 observe-handles 15 time-spent-in-QUERYING-phase

87828 time-spent-in-FETCHING-phase 82 livedata

invalidation-crossbar-listeners 16 subscriptions 11 sessions 1

Meteor APM also doesn't show anything out of the ordinary, the meteor.log doesn't show any meteor activity aside from the OOM and restart messages. MongoHQ isn't reporting any slow running queries or much activity - 0 queries, updates, inserts, deletes on avg from staring at their monitoring dashboard. So as far as I can tell, there hasn't been much activity for 24 hours, and certainly not anything intensive. I've since tried to install newrelic and nodetime but neither is quite working - newrelic shows no data and the meteor.log has a nodetime debug message

Failed loaded nodetime-native extention.

So when I try to use nodetime's CPU profiler it turns up blank and the heap snapshot returns with Error: V8 tools are not loaded.

I'm basically out of ideas at this point, and since Node is pretty new to me it feels like I'm taking wild stabs in the dark here. Please help.

Update: Server is still pegged at 100% four days later. Even an init 6 doesn't do anything - Server restarts, node process starts and jumps back up to 100% cpu. I tried other tools like memwatch and webkit-devtools-agent but could not get them to work with Meteor.

The following is the strace output

strace -c -p 6840

Process 6840 attached - interrupt to quit

^CProcess 6840 detached

% time seconds usecs/call calls errors syscall


77.17 0.073108 1 113701 epoll_wait

11.15 0.010559 0 80106 39908 mmap

6.66 0.006309 0 116907 read

2.09 0.001982 0 84445 futex

1.49 0.001416 0 45176 write

0.68 0.000646 0 119975 munmap

0.58 0.000549 0 227402 clock_gettime

0.10 0.000095 0 117617 rt_sigprocmask

0.04 0.000040 0 30471 epoll_ctl

0.03 0.000031 0 71428 gettimeofday

0.00 0.000000 0 36 mprotect

0.00 0.000000 0 4 brk


100.00 0.094735 1007268 39908 total

So it looks like the node process spends most of its time in epoll_wait.

Guidepost answered 18/4, 2014 at 17:15 Comment(1)
I'm not familiar with Meteor, but you're using _.each to iterate over results and perform asynchronous I/O on a potentially huge collection of items. That means if you have 15,000 items all 15,000 upserts etc would be attempted to be written concurrently. You should try doing it using async.eachLimit or similar.Occident
K
2

I had a similar issue. I didn't need Oplog and I was suggested to add meteor package "disable-oplog". So I did, and the CPU usage was reduced a lot. If you are not really taking advantage of Oplog it might be better to disable it, so do meteor add disable-oplog and see what happens.

I hope this helps.

Kyte answered 11/5, 2014 at 17:5 Comment(0)
K
0

-Are you using Meteor-up ? I also use New York 2

In my local enviroment with ubuntu server virtual box works awsome with only 512 Mb and 1 Core.

I'm having the same issue on DigitalOcean 4 Gb RAM, 2 cores VPS + Meteorup (and my app of course).

LOCAL ENVIROMENT on virtualbox - 1 CORE - 512 MB - New York 2 - ubuntu 14.04 x86.
-------------------------------------
>Meteor.js = 0.8.0,
>Node = 0.10.26,
>MongoDB shell version = 2.4.10,

>%CPU = 20.8 avg,
>%MEM = 27.4 avg

DIGITALOCEAN 4 GB RAM - 2 CPUS - ubuntu 14.04 x64.
-------------------------------------
>Meteor.js = 0.8.0,
>Node = 0.10.26,
>MongoDB shell version = 2.4.10,

>%CPU = 101.8 avg,
>%MEM = 27.4 avg

> PID meteoru+  20   0 1644244 796692   6228 R **102.2** **32.7**  84:47.08 node 

Also, my app does something like yours. Im using CFS package from atmosphere, and node-csv to read the CSV that i upload. The upload works great, also node-csv works great....but i can confirm you if thats the problem, it seems to be NODE running on DigitalOcean. My MongoDB works great also...

Knee answered 30/4, 2014 at 3:30 Comment(8)
Using mup. everything is a blur, but this is what i did to get it under control. 1. uninstall node and rerun mup setup - this is the only thing that brought cpu and mem back to normal. 2. broke out my app into two separate apps on two separate servers - 1 for user facing and 2 for doing the looping http requests. 3. optimized the heck out of my subscriptions and method calls. That's where I'm at right now. You can also check out this recently logged issue with Meteor - github.com/meteor/meteor/issues/2073 - the solution didn't help in my case, but that was when the server was stuck.Guidepost
I dont understund how did you uninstall node and the be able to use MUP, if mup is a pckg from NPM, can you explain more about that ?Knee
I tried using old versions of NodeJs, Swap Disk partition, etc...nothing. Im using MongoHQ.Knee
Have you tried restarting the server using init 6? If that helps bring the cpu/mem back to normal then you won't need to rerun mup setup. Restart didn't help me. If it doesn't help you, you can log into the server and remove node, or just rename the node directory and rerun mup setup. But I think if you've tried various node versions then you've done something similar. If that's the case, then I would start looking to optimize your code. Keep in mind I'm still in the middle of this so I'm trying to figure it out too :)Guidepost
yep that too but...didnt work for me (init 6) and prints this: [localhost] what(): [localhost] std::bad_alloc[localhost]Knee
So when your app starts up, it immediately shoots to 100% cpu like mine did? Or do you have to do something like upload a file before it goes to 100? If it is immediate, try doing what I did and remove node and rerun mup setupGuidepost
also, if you have a lot of stuff in Meteor.startup on the server, try removing those to see if it helps. bad_alloc sounds like memory stuff, you could try forcing a garbage collection like the Meteor issue I linked to earlier.Guidepost
After spending a lot...but a lot of time with this issue i found that the problem is CFS package and Meteor Up, i observed that in Meteor Up special functions on client side didnt work at all :/ it seems that CFS points to a local DB and Meteor Up always points to his own configured DB. Im not 100% but a 90% yes.. i may continue trying to see if i cant get it working for this weekend. Wish me luck.Knee
S
0

I was new with VPS and the first thing I tried to do is run my script. The problem was that I started the same server with node and pm2 a couple of times.

Solution

  1. run pm2 kill to kill all processes run by your process manager
  2. run killall node - to kill all running process if any remains
  3. run pm2 start <your_server>.js - to run your server again
Sollars answered 3/1, 2021 at 18:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.