Populating FosElasticaBundle running out of php memory, possible memory leak?
Asked Answered
N

3

5

I've installed FOSElasticaBundle and have it working with a cross section of my data.

My problem arises in that I have about 14m rows that I need to use to build an index. I ran the populate command and after about 6 hours yesterday it errored out at 10.8% with a memory error:

PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 52277 bytes) in /var/www/html/vendor/monolog/monolog/src/Monolog/Formatter/LineFormatter.php on line 111

As you can see I've set my php memory limit to 2G which should be quite excessive.

The last line before the error looked like

Populating index/entity, 10.8% (1315300/12186320), 36 objects/s (RAM : current=2045Mo peak=2047Mo)

And the current and peak were ticking up with every line, starting around 30mb.

My assumption here is that there is some sort of memory leak? Surely php's memory shouldn't be exhausted by this process. I've also tried the command with some extra parameters

app/console fos:elastica:populate --no-debug --no-reset --env=prod

but as I watch it running the current memory is still ticking up.

Any thoughts on what might be going on here and what I can do to debug it? I found this discussion which sounds like my problem, but doesn't really present a good solution: https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/82. I'm using doctrine and the default provider.

Thank you-

Nitrous answered 27/6, 2014 at 12:34 Comment(1)
you may have to reset and start indexing again, to reset: php app/console fos:elastica:resetAnsel
N
7

I'm not able to solve the memory leak entirely, but by running the command

app/console fos:elastica:populate --no-debug --no-reset --env=prod --offset=n

I've been able to populate in batches. I drastically cut down the amount of memory leaking by turning off the logger, using a solution on this page

https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/273

Setting my php memory_limit to 4G (!) I'm able to get more than 5m records populated without error, and thus after a couple of batches I should be done with this process.

Most solutions seem to involve writing a custom provider (see https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/457) but through a ridiculous memory_limit and limiting the memory leak as much as possible I didn't need to.

Nitrous answered 30/6, 2014 at 18:57 Comment(0)
B
1

The main problem here that everything is done in one process, all entities have to load in memory. It is done by chunks but still, it loads all the data. There is much you can do with it cuz the problem in the design.

The solution: The data could be split into chunks which are processed in separate processes in parallel. The worker processes may quit from time to time (they have to be restarted by Supervisord or similar tool) freeing the memory and resources. As a result, you'll get a lot better performance and better fault tolerance and less memory footprint.

There are many ways to implement this (using forks, pthreads or message queues) but I personally suggest looking at enqueue/elastica-bundle. It improves populate command by splitting the job and sending the messages.

Banky answered 27/6, 2017 at 8:10 Comment(0)
H
0

If the --no-debug option is not sufficient, you might want to check if you have any fingers_crossed handler and set the buffer_size:

monolog:
    handlers:
        main:
            type: fingers_crossed
            action_level: critical
            handler: grouped
            excluded_404s:
                - ^
            buffer_size: 30
Houser answered 9/9, 2019 at 14:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.