How to solve Python memory leak when using urrlib2?
Asked Answered
S

7

9

I'm trying to write a simple Python script for my mobile phone to periodically load a web page using urrlib2. In fact I don't really care about the server response, I'd only like to pass some values in the URL to the PHP. The problem is that Python for S60 uses the old 2.5.4 Python core, which seems to have a memory leak in the urrlib2 module. As I read there's seems to be such problems in every type of network communications as well. This bug have been reported here a couple of years ago, while some workarounds were posted as well. I've tried everything I could find on that page, and with the help of Google, but my phone still runs out of memory after ~70 page loads. Strangely the Garbege Collector does not seem to make any difference either, except making my script much slower. It is said that, that the newer (3.1) core solves this issue, but unfortunately I can't wait a year (or more) for the S60 port to come.

here's how my script looks after adding every little trick I've found:


import urrlib2, httplib, gc
while(true):
 url = "http://something.com/foo.php?parameter=" + value 
 f = urllib2.urlopen(url)
 f.read(1)
 f.fp._sock.recv=None # hacky avoidance
 f.close()
 del f
 gc.collect()
Any suggestions, how to make it work forever without getting the "cannot allocate memory" error? Thanks for advance, cheers, b_m

update: I've managed to connect 92 times before it ran out of memory, but It's still not good enough.

update2: Tried the socket method as suggested earlier, this is the second best (wrong) solution so far:


class UpdateSocketThread(threading.Thread):
  def run(self):
  global data
  while 1:
  url = "/foo.php?parameter=%d"%data
  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect(('something.com', 80))
  s.send('GET '+url+' HTTP/1.0\r\n\r\n')
  s.close()
  sleep(1)
I tried the little tricks, from above too. The thread closes after ~50 uploads (the phone has 50MB of memory left, obviously the Python shell has not.)

UPDATE: I think I'm getting closer to the solution! I tried sending multiple data without closing and reopening the socket. This may be the key since this method will only leave one open file descriptor. The problem is:


import socket
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.connect(("something.com", 80))
socket.send("test") #returns 4 (sent bytes, which is cool)
socket.send("test") #4
socket.send("test") #4
socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns the number of sent bytes, ok
socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns 0 on the phone, error on Windows7*
socket.send("GET /foo.php?parameter=bar HTTP/1.0\r\n\r\n") #returns 0 on the phone, error on Windows7*
socket.send("test") #returns 0, strange...
*: error message: 10053, software caused connection abort

Why can't I send multiple messages??

Stacked answered 18/11, 2010 at 11:26 Comment(10)
I'm not familiar with the execution environment, but would it be possible to spawn the load as a separate process each time, and let the OS process cleanup take care of the leak?Senile
That might be a good idea, thanks, I'll give it a try and keep you posted.Stacked
Try putting the whole thing into a function and calling that function in a while loop. The local variables should expire in that case, hopefully removing the leak...Wintergreen
Sorry, just tested this, it doesn't work...Wintergreen
Just tested Russell Borogove's idea: the problem is, that Python can't terminate the thread which opens the connection, so in that case the memory fills up with suspended threads instead of urllibfile descriptors. I've just spent two hours debugging this...:(Stacked
@b_m: Try spawn a process with fork; I'm pretty sure threads will share file descriptors.Distended
@Brian: S60 doesn't support this function, It has been omitted from the Python port as wellStacked
@b_m: It's not leaving you many options, is it? :) I take it S60 doesn't support exec, either. Have you considered calling the C library equivalents for socket communication via ctypes, (or alternatively writing the socket communication in C and calling that)?Distended
uh, I have not, as a matter of fact, I've been using Python, because it's easy. After months of programming in such lovely languages as Python or PHP, I don't feel like getting back to the good ol' C. If it's the only option left, I'll look into that...Stacked
I updated the first post, anybody on the slightly modified problem?Stacked
S
0

I think this is probably your problem. To summarize that thread, there's a memory leak in Pys60's DNS lookup, and you can work around it by moving DNS lookup outside the inner loop.

Schoonmaker answered 15/1, 2011 at 2:43 Comment(2)
This should definitely work. I have tried the google example and I killed it after 1400 iterations. Thank you very much!Stacked
The strange thing is that, it's still not good. It crashes after ~200 page loads even if there is only one DNS lookup. I tried the example on the forum.nokia page, that you've linked and it's not good.Stacked
W
1

Using the test code suggested by your link, I tested my Python installation and confirmed that it indeed leaks. But, if, as @Russell suggested, I put each urlopen in its own process, the OS should clean up the memory leaks. In my tests, memory, unreachable objects and open files all remain more or less constant. I split the code into two files:

connection.py

import cPickle, urllib2

def connectFunction(queryString):
    conn = urllib2.urlopen('http://something.com/foo.php?parameter='+str(queryString))
    data = conn.read()
    outfile = ('sometempfile'. 'wb')
    cPickle.dump(data, outfile)
    outfile.close()

if __name__ == '__main__':
    connectFunction(sys.argv[1])

###launcher.py
import subprocess, cPickle

#code from your link to check the number of unreachable objects

def print_unreachable_len():
    # check memory on memory leaks
    import gc
    gc.set_debug(gc.DEBUG_SAVEALL)
    gc.collect()
    unreachableL = []

    for it in gc.garbage:
        unreachableL.append(it)
    return len(str(unreachableL))

    #my code
    if __name__ == '__main__':        
        print 'Before running a single process:', print_unreachable_len()
        return_value_list = []
        for i, value in enumerate(values): #where values is a list or a generator containing (or yielding) the parameters to pass to the URL
             subprocess.call(['python', 'connection.py', str(value)])
             print 'after running', i, 'processes:', print_unreachable_len()
             infile = open('sometempfile', 'rb')
             return_value_list.append(cPickle.load(infile))
             infile.close()

Obviously, this is sequential, so you will only execute a single connection at a time, which may or may not be an issue for you. If it is, you will have to find a non-blocking way of communicating with the processes you're launching, but I'll leave that as an exercise for you.

EDIT: On re-reading your question, it seems you don't care about the server response. In that case, you can get rid of all the pickling related code. And obviously, you won't have the print_unreachable_len() related bits in your final code either.

Wintergreen answered 19/11, 2010 at 16:6 Comment(1)
At first I've misunderstood Russel's suggestion, I've tried separate threads instead of separate processes. I'm developing the application to a Nokia phone (pys60), I don't think that it's possible to implement your solution. However it should be working good on a Linux machine. Thanks for your effort, I appreciate it.Stacked
P
1

There exist a reference cycle in urllib2 created in urllib2.py:1216. The issue is on going and exists since 2009. https://bugs.python.org/issue1208304

Patricide answered 22/12, 2016 at 10:15 Comment(1)
this should have been a commentDevisee
M
0

This seems like a (very!) hacky workaround, but a bit of googling found this comment on the problem:

Apparently adding f.read(1) will stop the leaking!

import urllib2
f = urllib2.urlopen('http://www.google.com')
f.read(1)
f.close()

EDIT: oh, I see you already have f.read(1)... I'm all out of ideas then :/

Moidore answered 18/11, 2010 at 11:37 Comment(0)
D
0

Consider using the low-level socket API (related howto) instead of urllib2.

HOST = 'daring.cwi.nl'    # The remote host
PORT = 50007              # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.send('GET /path/to/file/index.html HTTP/1.0\n\n')

 # you'll need to figure out how much data to read and read that exactly
 # or wait for read() to return data of zero length (I think!)
DATA_SZ = 1024
data    = s.recv(DATA_SZ)
s.close()
print 'Received', repr(data)

How to execute and read a HTTP request via low-level sockets is a bit beyond the scope of the question (and perhaps may make a good question on its own on stackoverflow — I searched but didn't see it), but I hope this points you in the direction of a solution that may resolve your problem!

edit An answer in here about using makefile may be helpful: HTTP basic authentication using sockets in python

Distended answered 19/11, 2010 at 14:57 Comment(1)
Thanks for your reply, I tried this one, but it seems to be the same problem. I'm not sure yet, because I haven't seen the error (possibly because of the new thread), but it stops refreshing after 52 updates.Stacked
C
0

This does not leak for me with Python 2.6.1 on a Mac. Which version are you using?

BTW, your program doesn't work due to a few typos. Here is one that does work:

import urllib2, httplib, gc
value = "foo"
count = 0
while(True):
    url = "http://192.168.1.1/?parameter=" + value 
    f = urllib2.urlopen(url)
    f.read(1)
    f.fp._sock.recv=None # hacky avoidance
    f.close()
    del f
    print "count=",count
    count += 1
Cruickshank answered 21/11, 2010 at 13:7 Comment(2)
According to the wikipedia, the latest pys60 uses the 2.5.4 Phython Core, the syntax looks fine, however this is not from my code, it's slightly simplifiedStacked
Yes, it is slightly simplified, but it still does what you wanted.Cruickshank
D
0

Depending on platform and python version, python might not release memory back to OS. See this stackoverflow thread. That said, python should not endlessly consume memory. Judging from the code you use, it appears to be bug in python runtime unless, urllib/sockets use globals which I don't believe it does - blame it on Python on S60!

Have you considered other sources of memory leakage? Endless log file open, ever increasing array or smth like that? If it truly is a bug in sockets interface, then your only option is to use the subprocess approach.

Daune answered 22/11, 2010 at 12:50 Comment(1)
I ruled out every other possibility of the memory leak. Maybe it's stupid, but isn't it possible somehow to use the urllib from the newer pyhon, even if the Pys60 is old?Stacked
S
0

I think this is probably your problem. To summarize that thread, there's a memory leak in Pys60's DNS lookup, and you can work around it by moving DNS lookup outside the inner loop.

Schoonmaker answered 15/1, 2011 at 2:43 Comment(2)
This should definitely work. I have tried the google example and I killed it after 1400 iterations. Thank you very much!Stacked
The strange thing is that, it's still not good. It crashes after ~200 page loads even if there is only one DNS lookup. I tried the example on the forum.nokia page, that you've linked and it's not good.Stacked

© 2022 - 2024 — McMap. All rights reserved.