How do I deal with very large file-uploads in an Erlang web server?

Asked 4/3, 2010 at 4:39 Answered 6/3, 2010 at 12:59

Solved http file-upload erlang httpserver

So, lets say I'm writing a web server and I want to support "very large" file uploads. Lets further assume that I mean to do this via the standard multipart/form-data MIME type. I should say that I'm using erlang and that I plan to collect http packets as they are returned from erlang:decode_packet/2, but I do not want to actually collect the request body until the http request handler has found place for the uploaded content to go. Should I

a) go-ahead and collect the body anyway, ignoring the possibility of its being very very large and thus possibly crashing the server due to its running out of memory?

b) refrain from receiving on the socket any (possibly non-existent) request body until after the headers have been processed?

c) do something else?

An example for answer c might be: spawn another process to collect and write the uploaded content to a temporary location (in order to minimize memory use), while simultaneously giving that location to the http request handler for future processing. But I just don't know - is there a standard technique here?

Perpetuity answered 4/3, 2010 at 4:39 Comment(2)

Well, the consensus seems to be that the standard way is to do what I suggested for option c. Still, I feel that there must be a better way - I am bothered by the awkwardness of temporary files -- they require additional erlang ports to be opened (more than once if I plan to read the file back in at some point), and they divide between two or more processes what I'd want to be handled by one. This is, however, what I had been planning to do - I'd hopped that somebody might be doing things in a different way. – Perpetuity 4/3, 2010 at 21:1

You need to store the data. Practically this is done in memory, or on a storage device. Your question says memory is not an option; your comment says you don't like storing it on a device either. The only remaining option is occultism... – Malebranche 5/3, 2010 at 7:54

In my opinion option b is clearly the superior one.

During the period of time that you are not reading the socket, the TCP code will continue to buffer the incoming data within the kernel. As it does so, it will advertise a smaller and smaller TCP window size to the HTTP server, until eventually (when the TCP receive buffers in the kernel are full), the TCP window will close.

In other words, by not reading the socket, you are allowing TCP flow-control do its job.

Canter answered 6/3, 2010 at 12:59 Comment(1)

I was secretly looking for justification for doing b, thanks for helping to provide it. From me, it makes better sense from a code maintenance, but this wasn't enough for me to implement it. – Perpetuity 6/3, 2010 at 23:15

In my implementation I uses your example for answer c - I read from socket chunk by chunk and store chunks to temporary file. Also, afaik yaws uses simillar technique - you can see it at yaws/src/yaws_multipart.erl

Firebird answered 4/3, 2010 at 7:31 Comment(0)

Storing to a temporary file is also the way PHP does things, so it's a tried and tested way. You could count the bytes received and disconnect if it reaches a size that makes no sense.

Imbecilic answered 4/3, 2010 at 19:1 Comment(0)

Recommended topics

Hot tags