How can the "packet" option of socket in Erlang accelerate the tcp transmission so much?
Asked Answered
P

3

6

It takes only 8 seconds to transfer 1G data through two different ports on localhost using {packet,4}, while the same task can't be finished within 30 seconds using {packet,raw}. I know if use the latter method, the data will arrive in tens of thousands small pieces (on archlinux the size is 1460 bytes). I've learned some aspects of TCP/IP protocol and have been thinking about this question for days but still can't figure out what is the EXACT difference. Sincerely look forward to some bottom-up explanation.

-module(test).

-export([main/1]).

-define(SOCKOPT, [binary,{active,true},{packet,4}]).

main(_) ->
    {ok, LSock} = gen_tcp:listen(6677, ?SOCKOPT),
    spawn(fun() -> send() end),
    recv(LSock).

recv(LSock) ->
    {ok, Sock} = gen_tcp:accept(LSock),
    inet:setopts(Sock, ?SOCKOPT),
    loop(Sock).

loop(Sock) ->
    receive
        {tcp, Sock, Data} ->
            io:fwrite("~p~n",[bit_size(Data)]),
            loop(Sock);
        {tcp_closed, Sock} -> ok
    end.

send() ->
    timer:sleep(500),
    {ok, Sock}=gen_tcp:connect("localhost", 6677, ?SOCKOPT),
    gen_tcp:send(Sock, binary:copy(<<"1">>, 1073741824)),
    gen_tcp:close(Sock).

$ time escript test.erl
8589934592
real 0m8.919s user 0m6.643s sys 0m2.257s
Physiography answered 14/12, 2012 at 12:20 Comment(9)
I'm an Erlang-illiterate, but can you explain to me what {packet, 4}means? Then I might be able to answer your question.Presentiment
Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. The length of header can be one, two, or four bytes. The option {packet,4} defines that the data can be divided into packets of a maximum size of 2^32 bit. (But actually only up to 2Gb is allowed but this doesn't affect my question.) And the question mostly is not relevant to Erlang but how to send such a huge packet.Physiography
However, I wonder whether the option is a feature of Erlang cuz I cannot find such option in C, Go, Java and Python.Physiography
The most inapprehension thing for me is to get the whole 1GB data (more than 2Gb) needs only once matching {tcp, Sock, Data}. It means even though 1GB data is divided into serveral packets, they arrive continuously. After all, what is the real use of packets?Physiography
Packet mode is an Erlang convenience; if you send a large chunk of data over TCP, it will be split up, and you have to have some way of knowing how large the original chunk was to know how to reassemble it. Packet mode does the work for you. That's why only a single message is sent in packet mode: behind the scenes Erlang grabbed all relevant packets and stitched them together before sending the message. That you don't have to call receive thousands of times is perhaps why this is faster, but if you post code it'll be easier to see what exactly is going on.Brazen
Alright, code is included. Is it the difference lying in using paskets and expanding the buffer size that the former will assemble the data automatically?Physiography
First make sure that your two tests are as similar as possible. One is doing some 735,400 formatted writes to stdout, the other is doing a single one. Remove the io:fwrite/2 and run the test again. You may be surprised.Construct
Second, escript defaults to interpreted mode, which can cause extremely skewed results. Add a -mode(compile).Construct
Thanks! You just shot all the problems and it is really awesome.Physiography
D
1

When you use {packet,4} erlang first reads 4 bytes to get length of your data, allocates a buffer to hold it and reads data into buffer after getting each tcp packet. Then it sends the buffer as one packet to your process. This all happens inside builtin read code, which is rather fast.

When you use {packet,raw} erlang sends a message to your process after receiving each tcp packet of data, so for each tcp packet it does many more things.

Dextroamphetamine answered 16/12, 2012 at 18:4 Comment(5)
Thanks for sharing your idea. But I think using {packet,4} can only simplify the code of receiver but not speed up the transmission, since it doesn't change the send buffer and receive buffer.Physiography
Using {packet,4} means your receiver code is not called every 1.5KB (max tcp packet size), just once per whole 1GB.Dextroamphetamine
In my opinion, using {packet,4} just expands the user-level buffer, which can also reduce the calls, but when we send such large data, the transmission rate may be more correlated to the kernel-level sndbuf and recbuf. Actually I got a faster result when replacing {packet,4} with {sndbuf,4194304},{recbuf,4194304}.Physiography
Is there a way to tell it to use big/small endian?Paugh
No. Only hacking Erlang VM or doing it yourself with <<Len:32/little>> = gen_tcp:recv(Socket,4), Data=gen_tcp:recv(Socket,Len) and using passive receiving.Dextroamphetamine
P
1

When the data is received in small pieces, the kernel buffer at receiver end fills up fast. It will reduce the congestion window size at sender side forcing the sender to push data at lower rate.

Pescara answered 25/5, 2014 at 10:55 Comment(0)
N
-1

Try

-define(SOCKOPT, [binary,{active,true},{recbuf, 16#FFFFFF}, {sndbuf, 16#1FFFFFF}])
Nacelle answered 7/11, 2019 at 9:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.