In reply to Moshe Rubin's answer
:
RFC 2046, Section: Abstract
Because RFC 822 said so little about message bodies, these documents (Ed: RFC 2045, 2046, 2047, 2048, 2049)
are largely orthogonal to (rather than a revision of) RFC 822.
RFC 2046, Section 5.1: Multipart Media Type
In the case of multipart entities, in which one or more different
sets of data are combined in a single body, a "multipart" media type
field must appear in the entity's header. The body must then contain
one or more body parts, each preceded by a boundary delimiter line,
and the last one followed by a closing boundary delimiter line.
After its boundary delimiter line, each body part then consists of a
header area, a blank line, and a body area. Thus a body part is
similar to an RFC 822 message in syntax, but different in meaning.
RFC 2046, Section 5.1.1: Common Syntax
This Content-Type value indicates that the content consists of one or
more parts, each with a structure that is syntactically identical to an RFC 822 message, except that the header area is allowed to be completely empty, and that the parts are each preceded by the line
<example of a boundary line>
RFC 2045, Section 2.1: "CRLF"
The term CRLF, in this set of documents, refers to the sequence of
octets corresponding to the two US-ASCII characters CR (decimal value
13) and LF (decimal value 10) which, taken together, in this order,
denote a line break in RFC 822 mail.
Note the use of the plural "octets". Based on those two RFC's, I do think that a CLRF is defined by the spec to be two octets--not one.
Next, according to RFC 2387 a multipart/related
Content-Type header MUST include a type parameter:
RFC 2387: The MIME Multipart/Related Content-type
The type parameter must be specified and its value is the MIME media
type of the "root" body part.
So, in order for google to be be creating a multipart/related request in accordance with the spec, the Content-Type header in the example should be:
Content-type: multipart/related;
boundary=".....";
type="application/json"
Based on the RFC's quoted above, I do not think google is following the spec in two instances.
Here's what the byte count should be for the body of the multipart request in the google example:
start character
|
V
--===============0688100289== Content-type: application/json
{"title": "test-multipart.txt", "parents": [{"id":"0B09i2ZH5SsTHTjNtSS9QYUZqdTA"}], "properties": [{"kind": "drive#property", "key": "cloudwrapper", "value": "true"}]}
--===============0688100289== Content-type: text/plain
We're testing multipart uploading!
--===============0688100289==--
^
|
end character
First, let's test a file that can easily be counted! Here's a simple file:
test_byte_count.txt
12345
12345
There's 5 bytes (= 8-bit bytes or octets) on the first line, followed by a newline, which is \n
on my system, so 1 byte, then there are 5 bytes on the second line (no terminating newline). Confirmed by hexdump
:
hexdump -c test_byte_count.txt
0000000 1 2 3 4 5 \n 1 2 3 4 5
000000b
So, the byte count should be 11:
$ wc -c test_byte_count.txt
11 test_byte_count.txt
And, hexdump
actually gives the number of bytes on the last line:
000000b
In hexidecimal notation, A is 10 and B is 11 and C is 12, etc, so hexdump
is reporting that the file is 11 bytes long.
Next, as discussed above the http protocol requires newlines sent over the wire to be represented by the two octets \r\n
, so the byte count for the file test_byte_count.txt
with two octets for the newline should be 12. unix2dos
will convert a file with unix newlines to a file with dos newlines, i.e. \r\n
:
$ unix2dos -n test_byte_count.txt test_byte_count_dos_newlines.txt
unix2dos: converting file test_byte_count.txt to file test_byte_count_dos_newlines.txt in DOS format...
Here's what's in the file test_byte_count_dos_newlines.txt:
$ hexdump -c test_byte_count_dos_newlines.txt
0000000 1 2 3 4 5 \r \n 1 2 3 4 5
000000c
$ wc -c test_byte_count_dos_newlines.txt
12 test_byte_count_dos_newlines.txt
Therefore, I just need to use unix2dos
to convert a file containing the body of the multipart request to a file with dos newlines, then get the byte count. First, here is the byte count before converting \n
newlines to \r\n
newlines:
$ cat multipart_request_unix_newlines.txt
--===============0688100289==
Content-type: application/json
{"title": "test-multipart.txt", "parents": [{"id":"0B09i2ZH5SsTHTjNtSS9QYUZqdTA"}], "properties": [{"kind": "drive#property", "key": "cloudwrapper", "value": "true"}]}
--===============0688100289==
Content-type: text/plain
We're testing multipart uploading!
--===============0688100289==--
$ wc -c multipart_request_unix_newlines.txt
352 multipart_request_unix_newlines.txt
And, unix2dos
can actually report the number of dos and unix newlines in a file:
$ unix2dos -i multipart_request_unix_newlines.txt
0 8 0 no_bom text multipart_request_unix_newlines.txt
The first and second column are the number of dos and unix newlines found in the file (the third column is for old Mac newlines \r
). Eight unix newlines were found in the file, so when unix2dos
converts those newlines to dos newlines we would expect 8 more bytes to be added to the file, and 8 bytes added to the previously reported byte count of 352, gives us 360 bytes. Therefore, we should expect there to be 360 bytes in the converted file:
$ unix2dos -n multipart_request_unix_newlines.txt multipart_request_dos_newlines.txt
unix2dos: converting file multipart_request_unix_newlines.txt to file multipart_request_dos_newlines.txt in DOS format...
$ wc -c multipart_request_dos_newlines.txt
360 multipart_request_dos_newlines.txt
It appears likely that google calculated the byte count of the body of the multipart request before converting the body to \r\n
newlines.