Ignore Path MTU on Linux when sending UDP packets
Asked Answered
G

0

6

I am implementing DPLPMTUD and I want to suppress the Linux kernel from returning -1 with errno = EMSGSIZE when I send UDP packet longer than the local interface's MTU. I want to avoid the pain of dealing with error handling when several datagrams are sent out (especially when using sendmmsg(2)), each perhaps belonging to a different connection. I'd rather have the kernel drop the packet and let the application DPLPMTUD logic figure out the MTU.

ip(7) has this to say:

              It is possible to implement RFC 4821 MTU probing with SOCK_DGRAM
              or SOCK_RAW sockets by  setting  a  value  of  IP_PMTUDISC_PROBE
              (available  since Linux 2.6.22).  This is also particularly use‐
              ful for diagnostic tools such as tracepath(8) that wish  to  de‐
              liberately send probe packets larger than the observed Path MTU.

Yet setting this option does not produce the desired effect. Here is the code to illustrate the problem:

/* emsgsize.c: test whether IP_PMTUDISC_PROBE suppresses EMSGSIZE
 *
 * Usage: emsgsize packet_size
 */

#include <arpa/inet.h>
#include <errno.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>

#define CHECK(w_, s_) do { if ((s_) < 0) { perror(w_); return 1; }} while (0)

/* Payload */
static unsigned char data[64 * 1024];

int
main (int argc, char **argv)
{
    int fd, on, s, size;
    struct sockaddr_in si;
    ssize_t sent;

    if (argc != 2)
    {
        fprintf(stderr, "usage: emsgsize size\n");
        return 1;
    }
    size = atoi(argv[1]);

    memset(&si, 0, sizeof(si));
    si.sin_family = AF_INET;

    fd = socket(si.sin_family, SOCK_DGRAM, 0);
    CHECK("socket", fd);

    s = bind(fd, (struct sockaddr *) &si, sizeof(si));
    CHECK("bind", s);

    /* This is supposed to suppress sendmsg(2) returning -1 with
     * errno = EMSGSIZE, see ip(7):
     *
     "        It is possible to implement RFC 4821 MTU probing with SOCK_DGRAM
     "        or SOCK_RAW sockets by  setting  a  value  of  IP_PMTUDISC_PROBE
     "        (available  since Linux 2.6.22).  This is also particularly use-
     "        ful for diagnostic tools such as tracepath(8) that wish  to  de-
     "        liberately send probe packets larger than the observed Path MTU.
     */
    on = IP_PMTUDISC_PROBE;
    s = setsockopt(fd, IPPROTO_IP, IP_MTU_DISCOVER, &on, sizeof(on));
    CHECK("setsockopt", s);

    memset(&si, 0, sizeof(si));
    si.sin_family = AF_INET;
    si.sin_port = htons(12345); /* Destination does not matter */
    s = inet_pton(AF_INET, "127.0.0.1", &si.sin_addr);
    CHECK("inet_pton", s);
    sent = sendto(fd, data, (size_t) size, 0, (struct sockaddr *) &si,
                                                            sizeof(si));
    CHECK("sendto", sent);

    return 0;
}

When I send packets larger than the MTU, sendto() above returns -1 and errno is set to EMSGSIZE -- exactly what I want to avoid.

Is there a way to do what I want?

Gunman answered 10/7, 2020 at 21:28 Comment(6)
IP_PMTUDISC_PROBE sets do-not-fragment flag (which causes EMSGSIZE error to be returned for too long messages). Use IP_PMTUDISC_WANT instead: that allows fragmenting the datagram, but does path MTU discovery too (and sets DF flag for datagrams that are not too long). I recommend you look at the official man 7 ip page at man7.org for the most accurate, up-to date documentation; they're well described there.Ivette
I don't want the kernel to do the MTU discovery, though, I want to do it myself.Gunman
Of course it is my job. The whole idea behind DPLPMTUD is that you discover the PMTU at the PL -- the Packetization Layer.Gunman
@Dmitri: Then use IP_PMTUDISC_DONT. No EMSGSIZE errors, and messages exceeding MTU are just dropped.Ivette
@None, I want the DF bit set, though. It is required by the QUIC Internet Draft: "UDP datagrams MUST NOT be fragmented at the IP layer. In IPv4 [IPv4], the DF bit MUST be set to prevent fragmentation on the path."Gunman
Here is commit where IP_PMTUDISC_PROBE was added. Changes in net/ipv4/ip_output.c indicate that you can send packets without fragmentation larger than MTU to destination, but still not larger than device MTU.Helium

© 2022 - 2024 — McMap. All rights reserved.