How does HTTP file upload work?
Asked Answered
C

5

743

When I submit a simple form like this with a file attached:

<form enctype="multipart/form-data" action="http://localhost:3000/upload?upload_progress_id=12344" method="POST">
<input type="hidden" name="MAX_FILE_SIZE" value="100000" />
Choose a file to upload: <input name="uploadedfile" type="file" /><br />
<input type="submit" value="Upload File" />
</form>

How does it send the file internally? Is the file sent as part of the HTTP body as data? In the headers of this request, I don't see anything related to the name of the file.

I just would like the know the internal workings of the HTTP when sending a file.

Cynthia answered 28/12, 2011 at 18:34 Comment(4)
I have not used a sniffer in a while but if you want to see what is being sent in your request (since it is to the server it is a request) sniff it. This question is too broad. SO is more for specific programming questions.Unclench
...as sniffers go, fiddler is my weapon of choice. You can even build up your own test requests to see how they post.Intercession
For those interested, also see "MAX_FILE_SIZE in PHP - what's the point" on https://mcmap.net/q/55550/-max_file_size-in-php-what-39-s-the-point/632951Cheapjack
I find MAX_FILE_SIZE weird. as I can modify my html in chrome to 100000000 before posting it so it posts a better value. Either 1. have it in a cookie with a secure hash via salt so cookie if modified, server can validate and throw exception(like webpieces or playframework both do) or some sort of form validation that things haven't changed. @CynthiaRaymonraymond
R
415

Let's take a look at what happens when you select a file and submit your form (I've truncated the headers for brevity):

POST /upload?upload_progress_id=12344 HTTP/1.1
Host: localhost:3000
Content-Length: 1325
Origin: http://localhost:3000
... other headers ...
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L

------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="MAX_FILE_SIZE"

100000
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="uploadedfile"; filename="hello.o"
Content-Type: application/x-object

... contents of file goes here ...
------WebKitFormBoundaryePkpFF7tjBAqx29L--

NOTE: each boundary string must be prefixed with an extra --, just like in the end of the last boundary string. The example above already includes this, but it can be easy to miss. See comment by @Andreas below.

Instead of URL encoding the form parameters, the form parameters (including the file data) are sent as sections in a multipart document in the body of the request.

In the example above, you can see the input MAX_FILE_SIZE with the value set in the form, as well as a section containing the file data. The file name is part of the Content-Disposition header.

The full details are here.

Righteousness answered 28/12, 2011 at 20:11 Comment(10)
Does this mean that port 80 (or the port serving http requests) is unusable during the time of the file transfer?. For e.g. if a huge file (about a GB) is being uploaded will the web server not be able to respond to any other requests during this time?Ruthanneruthe
@source.rar: No. Webservers are (almost?) always threaded so that they can handle concurrent connections. Essentially, the daemon process that's listening on port 80 immediately hands off the task of serving to another thread/process in order that it can return to listening for another connection; even if two incoming connections arrive at exactly the same moment, they'll just sit in the network buffer until the daemon is ready to read them.Nettlesome
The threading explanation is a bit incorrect since there are high performance servers that are designed as single threaded and use a state machine to quickly take turns downloading packets of data from connections. Rather, in TCP/IP, port 80 is a listening port, not the port the data is transferred on.Triptolemus
When an IP listening socket (port 80) receives a connection another socket is created on another port, usually with a random number above 1000. This socket is then connected to the remote socket leaving port 80 free to listen for new connections.Triptolemus
@Triptolemus First of all, this is about HTTP. FTP active mode doesn't apply here. Second, listening socket doesn't get blocked on every connection. You can have as many connections to one port, as the other sides has ports to bind their own end to.Koweit
@Slotos: I'm talking about HTTP. Not FTP active mode. I'm not talking about ports either, I'm talking about sockets.Triptolemus
Note that the boundary string that is passed as part of the Content-Type header field is 2 characters shorter than the boundary strings for the individual parts below. I've just spent an hour of trying to figure out why my uploader doesn't work because it's quite hard to notice that there are actually only 4 dashes in the first boundary string but 6 dashes in the other boundary strings. In other words: When using the boundary string to separate the individual form data, it has to be prefixed by two dashes: -- It's described in RFC1867 of course but I think it should be pointed out here as wellIllyes
That's a very nice historic document. Nice to see where it was conceived. On the elementary level, it looks like the input[type=file] lets the form to hook into a file on the client system, and then sends it as binary array on form submit.Elwaine
@Illyes comment is very, very important. I took me few hours to figure out how to upload a file from FoxPro :P until I figured out that this is the issue. Maybe this answer could be updated?Rowden
thanks, what i probably missed is how should create the boundary value for cases im sending a file in formData not using a browser?Dingle
T
384

How does it send the file internally?

The format is called multipart/form-data, as asked at: What does enctype='multipart/form-data' mean?

I'm going to:

  • add some more HTML5 references
  • explain why he is right with a form submit example

HTML5 references

There are three possibilities for enctype:

How to generate the examples

Once you see an example of each method, it becomes obvious how they work, and when you should use each one.

You can produce examples using:

Save the form to a minimal .html file:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8"/>
  <title>upload</title>
</head>
<body>
  <form action="http://localhost:8000" method="post" enctype="multipart/form-data">
  <p><input type="text" name="text1" value="text default">
  <p><input type="text" name="text2" value="a&#x03C9;b">
  <p><input type="file" name="file1">
  <p><input type="file" name="file2">
  <p><input type="file" name="file3">
  <p><button type="submit">Submit</button>
</form>
</body>
</html>

We set the default text value to a&#x03C9;b, which means aωb because ω is U+03C9, which are the bytes 61 CF 89 62 in UTF-8.

Create files to upload:

echo 'Content of a.txt.' > a.txt

echo '<!DOCTYPE html><title>Content of a.html.</title>' > a.html

# Binary file containing 4 bytes: 'a', 1, 2 and 'b'.
printf 'a\xCF\x89b' > binary

Run our little echo server:

while true; do printf '' | nc -l localhost 8000; done

Open the HTML on your browser, select the files and click on submit and check the terminal.

nc prints the request received.

Tested on: Ubuntu 14.04.3, nc BSD 1.105, Firefox 40.

multipart/form-data

Firefox sent:

POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150
Content-Length: 834

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text1"

text default
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text2"

aωb
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file1"; filename="a.txt"
Content-Type: text/plain

Content of a.txt.

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file2"; filename="a.html"
Content-Type: text/html

<!DOCTYPE html><title>Content of a.html.</title>

-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file3"; filename="binary"
Content-Type: application/octet-stream

aωb
-----------------------------735323031399963166993862150--

For the binary file and text field, the bytes 61 CF 89 62 (aωb in UTF-8) are sent literally. You could verify that with nc -l localhost 8000 | hd, which says that the bytes:

61 CF 89 62

were sent (61 == 'a' and 62 == 'b').

Therefore it is clear that:

  • Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150 sets the content type to multipart/form-data and says that the fields are separated by the given boundary string.

    But note that the:

    boundary=---------------------------735323031399963166993862150
    

    has two less dadhes -- than the actual barrier

    -----------------------------735323031399963166993862150
    

    This is because the standard requires the boundary to start with two dashes --. The other dashes appear to be just how Firefox chose to implement the arbitrary boundary. RFC 7578 clearly mentions that those two leading dashes -- are required:

4.1. "Boundary" Parameter of multipart/form-data

As with other multipart types, the parts are delimited with a boundary delimiter, constructed using CRLF, "--", and the value of the "boundary" parameter.

application/x-www-form-urlencoded

Now change the enctype to application/x-www-form-urlencoded, reload the browser, and resubmit.

Firefox sent:

POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: application/x-www-form-urlencoded
Content-Length: 51

text1=text+default&text2=a%CF%89b&file1=a.txt&file2=a.html&file3=binary

Clearly the file data was not sent, only the basenames. So this cannot be used for files.

As for the text field, we see that usual printable characters like a and b were sent in one byte, while non-printable ones like 0xCF and 0x89 took up 3 bytes each: %CF%89!

Comparison

File uploads often contain lots of non-printable characters (e.g. images), while text forms almost never do.

From the examples we have seen that:

  • multipart/form-data: adds a few bytes of boundary overhead to the message, and must spend some time calculating it, but sends each byte in one byte.

  • application/x-www-form-urlencoded: has a single byte boundary per field (&), but adds a linear overhead factor of 3x for every non-printable character.

Therefore, even if we could send files with application/x-www-form-urlencoded, we wouldn't want to, because it is so inefficient.

But for printable characters found in text fields, it does not matter and generates less overhead, so we just use it.

Throstle answered 6/11, 2014 at 23:11 Comment(14)
How would you add a binary attachment? (i.e. a small image) - I can see changing the values for the Content-Disposition and Content-Type attributes but how to handle the 'content'?Antilogy
@ochi I don't quite understand: AFAIK the binary data just gets pasted byte by and your browser ensures that the separator byte sequence is not contained in it.Throstle
Yes, I meant how to get the binary data to put in as content. I have tried to paste some base64 encoded strings into the request (in soapui) and it will not let me so I am wondering if there are other methods of doing itAntilogy
@ochi you don't need to base64 encode it: you just put the bytes directly in the form. I have updated the answer to add a binary file to the example. I strongly recommend you learn more about printf and hd: with those tools you could have found it out easily ;)Throstle
Who and how determines the file's Content-Type?Rammer
@Rammer The browser does it automatically before sending the request. I don't know which heuristics it uses, but most likely the file extension is amongst them. This may answer the question: #1202445Throstle
@CiroSantilli六四事件法轮功纳米比亚威视 I think this answer is much better than the chosen one. But please remove the irrelevant content from your profile. It is against the spirit of SO.Admonish
@Rammer Regarding how browser determines the content-type, some quotation from rfc1867: Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as application/octet-stream.Admonish
@Admonish thanks for the rfc quote and for liking this answer! About username: to me, the spirit of SO is that everyone should have the best information at all times. ~~ Let's keep this discussion to twitter or meta. Peace.Throstle
Can you explain to me why the "application/octet-stream" has a filename part? what use is it for the server side?Byrn
@TannerSummers I think it is the same use as for any other mime type: the server can use it to create an identifier, or add metadata to the upload to help identifying it. Why do you think application/octet-s‌​tream should be different in that sense?Throstle
@CiroSantilli709大抓捕六四事件法轮功 and evereybody else.I hv multiple file upload facility.using same technique as describe above.But in IIS 6 server when I upload say 6 files of 6KB size,it return 404 error.If I upload say 3 file of 3MB size,it work ok.what must be the problem ?Vergara
@Vergara not enough detail to answer I think. Please open a new super detailed questions.Throstle
@CiroSantilli709大抓捕六四事件法轮功 ,plz view my post #43498665Vergara
S
87

Send file as binary content (upload without form or FormData)

In the given answers/examples the file is (most likely) uploaded with a HTML form or using the FormData API. The file is only a part of the data sent in the request, hence the multipart/form-data Content-Type header.

If you want to send the file as the only content then you can directly add it as the request body and you set the Content-Type header to the MIME type of the file you are sending. The file name can be added in the Content-Disposition header. You can upload like this:

var xmlHttpRequest = new XMLHttpRequest();

var file = ...file handle...
var fileName = ...file name...
var target = ...target...
var mimeType = ...mime type...

xmlHttpRequest.open('POST', target, true);
xmlHttpRequest.setRequestHeader('Content-Type', mimeType);
xmlHttpRequest.setRequestHeader('Content-Disposition', 'attachment; filename="' + fileName + '"');
xmlHttpRequest.send(file);

If you don't (want to) use forms and you are only interested in uploading one single file this is the easiest way to include your file in the request.

Update:

In all modern browsers you can these days also use the fetch API for (binary) upload. The same as mentioned in the example above would then look like this:

const promise = fetch(target, { 
  method: 'POST', 
  body: file, 
  headers: {
    'Content-Type': mimeType,
    'Content-Disposition', `attachment; filename="${fileName}"`,
  },
});

promise.then(
  (response) => { /*...do something with response*/ },
  (error) => { /*...handle error*/ },
);
Subtilize answered 28/1, 2015 at 13:4 Comment(11)
How do you configure a server side service for this with Asp.Net 4.0? Will it handle multiple input parameters as well, such as userId, path, captionText etc?Assentation
@AsleG Nope, it is only for sending a single file as the content of your request. I am not an Asp.Net expert, but you should simply pull the content (a blob) out of the request and save it to a file using the Content-Type from the header.Subtilize
@AsleG Maybe this link can helpSubtilize
@wilt If I don't use form, but I want to use formdata API, can I do it that way?Cubature
You don't need to use a form to use the FormData API. You can append the different parts in your javascript code. But I don't see how this relates to my answer on binary uploading...Subtilize
This is an excellent answer. I am trying to upload a file in a similar fashion (the cURL -T option), but in java. Could somebody guide me? My uploaded file contains unneeded info like Content-Disposition: form-data; filename=datafile.dat; name=datafile Content-Type: application/octet-stream and I am not sure on how to get this stuff to not appear in my uploaded file. Unfortunately I couldn't find mych help over the internet.Convergent
@AnkitKhettry What exactly is the problem with this? It is fine to have some extra information in the Content-Disposition header, you should simply extract the file name from the data...Subtilize
No, I wasn't really clear with my question. Inside the Content of the uploaded file, I am getting these headers and trailers and stuff. The originial file's content in the uploaded file is appearing between two weirdly long Strings, and is preceeded by the above headers I mentioned. I dont need theseConvergent
@AnkitKhettry Sounds like it is uploaded with a form or by using the form API. These 'weird strings' you refer to are the form boundaries normally used for separating the form data into parts on the server.Subtilize
@AsleG you can always use GET parameters in URL for additional data. For example if your upload url is ./upload.aspx, add whatever data you need to your url with get parameters like this ./upload.aspx?user=userid&filetype=jpg and read it on server. File will be sent using POST method and additional parameters with GET in same request. I know this is old comment but maybe someone will find this useful in future.Argon
Is your answer related to this question of mine? I'm confused. would be great if you could take a look.Ardith
R
19

I have this sample Java Code:

import java.io.*;
import java.net.*;
import java.nio.charset.StandardCharsets;

public class TestClass {
    public static void main(String[] args) throws IOException {
        ServerSocket socket = new ServerSocket(8081);
        Socket accept = socket.accept();
        InputStream inputStream = accept.getInputStream();

        InputStreamReader inputStreamReader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
        char readChar;
        while ((readChar = (char) inputStreamReader.read()) != -1) {
            System.out.print(readChar);
        }

        inputStream.close();
        accept.close();
        System.exit(1);
    }
}

and I have this test.html file:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>File Upload!</title>
</head>
<body>
<form method="post" action="http://localhost:8081" enctype="multipart/form-data">
    <input type="file" name="file" id="file">
    <input type="submit">
</form>
</body>
</html>

and finally the file I will be using for testing purposes, named a.dat has the following content:

0x39 0x69 0x65

if you interpret the bytes above as ASCII or UTF-8 characters, they will actually will be representing:

9ie

So let 's run our Java Code, open up test.html in our favorite browser, upload a.dat and submit the form and see what our server receives:

POST / HTTP/1.1
Host: localhost:8081
Connection: keep-alive
Content-Length: 196
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: null
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary06f6g54NVbSieT6y
DNT: 1
Accept-Encoding: gzip, deflate
Accept-Language: en,en-US;q=0.8,tr;q=0.6
Cookie: JSESSIONID=27D0A0637A0449CF65B3CB20F40048AF

------WebKitFormBoundary06f6g54NVbSieT6y
Content-Disposition: form-data; name="file"; filename="a.dat"
Content-Type: application/octet-stream

9ie
------WebKitFormBoundary06f6g54NVbSieT6y--

Well I am not surprised to see the characters 9ie because we told Java to print them treating them as UTF-8 characters. You may as well choose to read them as raw bytes..

Cookie: JSESSIONID=27D0A0637A0449CF65B3CB20F40048AF 

is actually the last HTTP Header here. After that comes the HTTP Body, where meta and contents of the file we uploaded actually can be seen.

Roturier answered 29/1, 2016 at 18:44 Comment(2)
Hi, is the line while ((readChar = (char) inputStreamReader.read()) != -1) { correct in the program above? (int)(char)-1 is actually 65535Cis
@Cis Should be correct..Roturier
S
7

An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.

http://www.tutorialspoint.com/http/http_messages.htm

Surcharge answered 28/12, 2011 at 18:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.