Confusion about syn queue and accept queue

Asked 3/8, 2020 at 16:1 Answered 10/8, 2020 at 18:7

Solved linux sockets tcp linux-kernel handshake

When reading TCP source code, I find a confused thing:

I know TCP has two queues in 3 way handshake:

The first queue stores connections which server has received the SYN and send back ACK + SYN, which we call as syn queue.
The second queue stores the connections that 3WHS are successful and connection established, which we call as accept queue.

But when reading codes, I find listen() will call inet_csk_listen_start(), which will call reqsk_queue_alloc() to create icsk_accept_queue. And that queue is used in accept(), when we find the queue is not empty, we will get a connection from it and return.

What's more, after tracing the receive process, the call stack is like

tcp_v4_rcv()->tcp_v4_do_rcv()->tcp_rcv_state_process()

The server status is LISTEN when receiving the first handshake. So it will call

`tcp_v4_conn_request()->tcp_conn_request()`

In tcp_conn_request()

if (!want_cookie)
    // Add the req into the queue
    inet_csk_reqsk_queue_hash_add(sk, req, tcp_timeout_init((struct sock *)req));

But here the queue is exactly the icsk_accept_queue, not a syn queue.

void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
                   unsigned long timeout)
{
    reqsk_queue_hash_req(req, timeout);
    inet_csk_reqsk_queue_added(sk);
}

static inline void inet_csk_reqsk_queue_added(struct sock *sk)
{
    reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue);
}

The accept() will return the established connection, which means icsk_accept_queue is the second queue, but where is the first queue?

Where does the connection changes from the first queue to the second?

Why does the Linux add new req into icsk_accept_queue?

Broadsword answered 3/8, 2020 at 16:1 Comment(8)

This is a really helpful blog entry from our friends at Cloudflare about the handling of SYN packets. – Hygrothermograph 3/8, 2020 at 16:28

Also, How TCP backlog works in Linux – Aurilia 3/8, 2020 at 18:7

@Jim D. Thanks for the link. But I think they can not give me an answer about how the queues in source code realize and work. – Broadsword 4/8, 2020 at 1:35

@Remy Lebeau Thanks for the link. But I think they can not give me an answer about how the queues in source code realize and work. – Broadsword 4/8, 2020 at 1:35

programmersought.com/article/65611480717 – Hygrothermograph 4/8, 2020 at 21:3

@Hygrothermograph Sorry but after reading those blogs and Linux source code again and again, I still can not solve the question. I edit my question to make it more clearly. – Broadsword 8/8, 2020 at 17:15

@Marquis of Lorne Cause I follow the source code, and I find tcp_rcv_state_process() will call tcp_v4_conn_request() when the status is LISTEN. I have showed it in the question. Maybe you miss it. – Broadsword 9/8, 2020 at 15:23

Sorry, I didn't have time earlier to give a proper answer. You got 99% of the way there and it is probably just the fact that the function names are very misleading since moving the syn queue to the ehash that kept you from the remaining 1%. – Hygrothermograph 10/8, 2020 at 19:22

In what follows we will follow the most typical code path and will ignore issues arising from packet loss, retransmission, and the use of atypical features such as TCP fast open (TFO in the code comments).

The call to accept is processed by intet_csk_accept, which calls reqsk_queue_remove to get a socket out of the accept queue &icsk->icsk_accept_queue from the listening socket:

struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
{
    struct inet_connection_sock *icsk = inet_csk(sk);
    struct request_sock_queue *queue = &icsk->icsk_accept_queue;
    struct request_sock *req;
    struct sock *newsk;
    int error;

    lock_sock(sk);

    [...]

    req = reqsk_queue_remove(queue, sk);
    newsk = req->sk;

    [...]

    return newsk;

    [...]
}

In reqsk_queue_remove, it uses rskq_accept_head and rskq_accept_tail to pull a socket out of the queue and to call sk_acceptq_removed:

static inline struct request_sock *reqsk_queue_remove(struct request_sock_queue *queue,
                              struct sock *parent)
{
    struct request_sock *req;

    spin_lock_bh(&queue->rskq_lock);
    req = queue->rskq_accept_head;
    if (req) {
        sk_acceptq_removed(parent);
        WRITE_ONCE(queue->rskq_accept_head, req->dl_next);
        if (queue->rskq_accept_head == NULL)
            queue->rskq_accept_tail = NULL;
    }
    spin_unlock_bh(&queue->rskq_lock);
    return req;
}

And sk_acceptq_removed reduces the length of the queue of sockets waiting to be accepted in sk_ack_backlog:

static inline void sk_acceptq_removed(struct sock *sk)
{
    WRITE_ONCE(sk->sk_ack_backlog, sk->sk_ack_backlog - 1);
}

This, I think, is fully understood by the questioner. Now let's look at what happens when a SYN is recieved, and when the final ACK of the 3WH arrives.

First the receipt of the SYN. Again, let's assume that TFO and SYN cookies are not in play and look at the most common path (at least not when there is a SYN flood).

The SYN is processed in tcp_conn_request where the connection request (not a full blown socket) is stored (we shall see where soon) by calling inet_csk_reqsk_queue_hash_add and then calling send_synack to respond to the SYN:

int tcp_conn_request(struct request_sock_ops *rsk_ops,
             const struct tcp_request_sock_ops *af_ops,
             struct sock *sk, struct sk_buff *skb)
{

   [...] 

   if (!want_cookie)
            inet_csk_reqsk_queue_hash_add(sk, req,
                tcp_timeout_init((struct sock *)req));
   af_ops->send_synack(sk, dst, &fl, req, &foc,
                    !want_cookie ? TCP_SYNACK_NORMAL :
                           TCP_SYNACK_COOKIE);

   [...]

   return 0;

   [...]
}

inet_csk_reqsk_queue_hash_add calls reqsk_queue_hash_req and inet_csk_reqsk_queue_added to store the request.

void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
                   unsigned long timeout)
{
    reqsk_queue_hash_req(req, timeout);
    inet_csk_reqsk_queue_added(sk);
}

reqsk_queue_hash_req puts the request into the ehash.

static void reqsk_queue_hash_req(struct request_sock *req,
                 unsigned long timeout)
{
    [...]

    inet_ehash_insert(req_to_sk(req), NULL);

    [...]
}

And then inet_csk_reqsk_queue_added calls reqsk_queue_added with the icsk_accept_queue:

static inline void inet_csk_reqsk_queue_added(struct sock *sk)
{
    reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue);
}

Which increments the qlen (not sk_ack_backlog):

static inline void reqsk_queue_added(struct request_sock_queue *queue)
{
    atomic_inc(&queue->young);
    atomic_inc(&queue->qlen);
}

The ehash is where all the ESTABLISHED and TIMEWAIT sockets have been stored, and, more recently, where the SYN "queue" is stored.

Note that there is actually no purpose in storing the arrived connection requests in a proper queue. Their order is irrelevant (the final ACKs can arrive in any order) and by moving them out of the listening socket, it is not necessary to take a lock on the listening socket to process the final ACK.

See this commit for the code that effected this change.

Finally, we can watch the request get removed from ehash and added as a full socket to the accept queue.

The final ACK of the 3WH is processed by tcp_check_req which creates a full child socket and then calls inet_csk_complete_hashdance:

struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
               struct request_sock *req,
               bool fastopen, bool *req_stolen)
{

    [...]

    /* OK, ACK is valid, create big socket and
     * feed this segment to it. It will repeat all
     * the tests. THIS SEGMENT MUST MOVE SOCKET TO
     * ESTABLISHED STATE. If it will be dropped after
     * socket is created, wait for troubles.
     */
    child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL,
                             req, &own_req);

    [...]

    return inet_csk_complete_hashdance(sk, child, req, own_req);

    [...]

}

Then inet_csk_complete_hashdance calls inet_csk_reqsk_queue_drop and reqsk_queue_removed on the request, and inet_csk_reqsk_queue_add on the child:

struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
                     struct request_sock *req, bool own_req)
{
    if (own_req) {
        inet_csk_reqsk_queue_drop(sk, req);
        reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
        if (inet_csk_reqsk_queue_add(sk, req, child))
            return child;
    }
    [...]
}

inet_csk_reqsk_queue_drop calls reqsk_queue_unlink, which removes the request from the ehash, and reqsk_queue_removed which decrements the qlen:

void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
{
    if (reqsk_queue_unlink(req)) {
        reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
        reqsk_put(req);
    }
}

Finally, inet_csk_reqsk_queue_add adds the full socket to the accept queue.

struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
                      struct request_sock *req,
                      struct sock *child)
{
    struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;

    spin_lock(&queue->rskq_lock);
    if (unlikely(sk->sk_state != TCP_LISTEN)) {
        inet_child_forget(sk, req, child);
        child = NULL;
    } else {
        req->sk = child;
        req->dl_next = NULL;
        if (queue->rskq_accept_head == NULL)
            WRITE_ONCE(queue->rskq_accept_head, req);
        else
            queue->rskq_accept_tail->dl_next = req;
        queue->rskq_accept_tail = req;
        sk_acceptq_added(sk);
    }
    spin_unlock(&queue->rskq_lock);
    return child;
}

TL;DR it is in the ehash, and the number of such SYNs is qlen (and not sk_ack_backlog, which holds the number of sockets in the accept queue).

Hygrothermograph answered 10/8, 2020 at 18:7 Comment(7)

Thanks for your reply, it's really clear. And one more question, why we add icsk_accept_queue when receiving SYN at inet_csk_reqsk_queue_added(sk). That is what makes me confused, I think it is more reasonable to add syn queue length. – Broadsword 11/8, 2020 at 3:36

And the other answer thinks there is no syn queue now. Is he wrong? – Broadsword 11/8, 2020 at 3:36

@Broadsword There is no longer an explicit queue, but the number of SYN connection requests is tracked and the requests are stored in the ehash. If the number of SYN requests exceeds the threshold, they will be dropped or SYN cookies will be used depending on configuration. To say it simply doesn't exist is inaccurate in my opinion. – Hygrothermograph 11/8, 2020 at 4:1

Thanks for your reply, and what about the increase of icsk_accept_queue as I mensioned above? – Broadsword 11/8, 2020 at 4:45

I think you are being misled by the names. There are two "queues," the SYN "queue" which isn't really stored in an actual queue but is stored in the ehash. The number of such SYNs is held in qlen. The accept queue is an actual queue pointed to by rskq_accept_head. Its length is stored in sk_ack_backlog. When we receive a SYN, we call inet_csk_reqsk_queue_added(sk), which calls reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue) which increments qlen, the size of the SYN queue, not the size of the accept queue. Things could certainly be named better. – Hygrothermograph 11/8, 2020 at 13:42

Thanks again. Now I get fully understand of the SYN queue and the ACCEPT queue. It's really happy to clear one confusion that block me for more than a week. – Broadsword 11/8, 2020 at 13:48

Glad to be of help. I noticed your work on ty-chen.github.io, so I knew you were serious about understanding this. 祝你好运再见 – Hygrothermograph 11/8, 2020 at 13:56

The short answer is that SYN queues are dangerous. The reason they are dangerous is that by sending a single packet (SYN), the sender can get the receiver to commit resources (memory for the SYN queue entry). if you send enough such packets fast enough, possibly with a forged origination address, you will either cause the receiver to exhaust its memory resources or to start refusing to accept legitimate connections.

For this reasons modern operating systems do not have a SYN queue. Instead, they will various techniques (the most common is called SYN cookies) that will allow them to only have a queue for connections that have already answered the initial SYN ACK packet, and thus proved they have dedicated resources themselves for this connection.

So, you are right - there is no SYN queue.

Mourner answered 9/8, 2020 at 8:34 Comment(1)

Thanks for your reply. But the blog veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html in the comment says the modern OS truely use two queues to avoid SYN flood. So is the blog wrong? – Broadsword 9/8, 2020 at 15:26

Recommended topics

Hot tags