Selenium not freeing up memory even after calling close/quit
Asked Answered
F

6

14

So I've been working on scraper that goes on 10k+pages and scrapes data from it.

The issue is that over time, memory consumption raises drastically. So to overcome this - instead of closing driver instance only at the end of scrape - the scraper is updated so that it closes the instance after every page is loaded and data extracted.

But ram memory still gets populated for some reason.

I tried using PhantomJS but it doesn't load data properly for some reason. I also tried with the initial version of the scraper to limit cache in Firefox to 100mb, but that also did not work.

Note: I run tests with both chromedriver and firefox, and unfortunately I can't use libraries such as requests, mechanize, etc... instead of selenium.

Any help is appreciated since I've been trying to figure this out for a week now. Thanks.

Fourier answered 2/7, 2016 at 21:29 Comment(1)
This is an old question but in my case I had to stop my pyvirtualdisplay with display.stop() as wellTantalizing
B
8

The only way to force the Python interpreter to release memory to the OS is to terminate the process. Therefore, use multiprocessing to spawn the selenium Firefox instance; the memory will be freed when the spawned process is terminated:

import multiprocessing as mp
import selenium.webdriver as webdriver

def worker()
    driver = webdriver.Firefox()
    # do memory-intensive work
    # closing and quitting is not what ultimately frees the memory, but it
    # is good to close the WebDriver session gracefully anyway.
    driver.close()
    driver.quit()

if __name__ == '__main__':
    p = mp.Process(target=worker)
    # run `worker` in a subprocess
    p.start()
    # make the main process wait for `worker` to end
    p.join()
    # all memory used by the subprocess will be freed to the OS

See also Why doesn't Python release the memory when I delete a large object?

Botanomancy answered 2/7, 2016 at 21:54 Comment(1)
I think this is the solution to this problem!Periodontics
P
2

Are you trying to say that your drivers are what's filling up your memory? How are you closing them? If you're extracting your data, do you still have references to some collection that's storing them in memory?

You mentioned that you were already running out of memory when you closed the driver instance at the end of scraping, which makes it seem like you're keeping extra references.

Patrizia answered 2/7, 2016 at 21:46 Comment(3)
Yes, it seems like driver is filling memory up. I have 5 functions where Selenium is used. I use selenium alongside Scrapy. So in those functions I just instantiate new driver instance, then at the near end of function I call driver.quit() or driver.close(). As for keeping extra references, I'm not sure that I do. I use selenium for loading page, and once it loads I put page_source into Scrapy selector. I don't have any memory leaks in Scrapy.Fourier
You can check for line-by-line memory usage (in your program not the websites) using memory_profiler. This should help in getting a better idea of what section is consuming your memory. If you can't find anything there, posting an example function here may be helpful.Patrizia
@Fourier also check top to see if there are multiple instances of whatever browser you are using.Ogilvy
H
1

I have experienced similar issue and destroying that driver my self (i.e setting driver to None) prevent those memory leaks for me

Hanhana answered 20/12, 2018 at 10:48 Comment(0)
O
1

I was having the same problem until putting the webdriver.get(url) statements inside a try/except/finally statement, and making sure webdriver.quit() was in the finally statement, this way, it always execute. Like:

webdriver = webdriver.Firefox()
try:
        webdriver.get(url)
        source_body = webdriver.page_source
except Exception as e:
        print(e)
finally:
        webdriver.quit()

From the docs:

The finally clause of such a statement can be used to specify cleanup code which does not handle the exception, but is executed whether an exception occurred or not in the preceding code.

Ogilvy answered 7/3, 2019 at 13:6 Comment(1)
this also have memory leak. for the reason that sometimes the selenium will be out of control and the webdriver variable becomes a None. Second, webdriver.quit() do not release all of the memory as the above answer.Alfy
B
0

use this

os.system("taskkill /f /im chromedriver.exe /T")
Breannebrear answered 15/6, 2022 at 7:12 Comment(0)
G
0

Still a problem in 2023. When using Selenium on Linux, I encountered 3 memory issues.

  1. Use .close() instead of .quit()
  • not sure why, but it helps
  1. When errors occur, running processes remain in memory and need to be killed.

  2. If no profile is specified, Selenium creates a new profile every time. This happens in /tmp. If a process is not terminated correctly, this profile will also remain. Since /tmp is in memory, it runs full. So in addition to the processes, the remaining data of the process must also be deleted.

Here is a shell script which terminates the processes and deletes their data for me.

#!/bin/bash

DIRECTORY_PATH="/tmp"
SEARCH_TERM="rust_mozprofile"

while true; do
    # Kill Zombies
    # Finds processes older than 2 minutes (120 seconds) and terminates them
    ps -eo pid,etime,cmd | awk -v term="$SEARCH_TERM" '
        $0 ~ term {
            split($2, time, ":")
            if (length(time) == 2) {
               minutes = time[1]
                if (minutes >= 2) {
                   print $1
                }
            } else if (length(time) == 3) {
                print $1
            }
        }' | while read pid; do
        echo "Killing Zombie #$pid"
        kill -9 $pid
    done

    COUNT=0

    # Clean up corpses
    # Search for folders in the specified directory that are older than 4 minutes and contain the search term
    # and delete them
    while read dir; do
        rm -rf "$dir"
        ((COUNT++))
    done < <(find "$DIRECTORY_PATH" -type d -name "*$SEARCH_TERM*" -mmin +4 2>/dev/null)

    if [ "$COUNT" -gt 0 ]; then
        echo "$COUNT corpse(s) of zombies burned"
    fi
    
    sleep 10
done
Garnett answered 23/10, 2023 at 8:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.