I am implementing DPLPMTUD and I want to suppress the Linux kernel from returning -1 with errno = EMSGSIZE
when I send UDP packet longer than the local interface's MTU. I want to avoid the pain of dealing with error handling when several datagrams are sent out (especially when using sendmmsg(2)
), each perhaps belonging to a different connection. I'd rather have the kernel drop the packet and let the application DPLPMTUD logic figure out the MTU.
ip(7)
has this to say:
It is possible to implement RFC 4821 MTU probing with SOCK_DGRAM
or SOCK_RAW sockets by setting a value of IP_PMTUDISC_PROBE
(available since Linux 2.6.22). This is also particularly use‐
ful for diagnostic tools such as tracepath(8) that wish to de‐
liberately send probe packets larger than the observed Path MTU.
Yet setting this option does not produce the desired effect. Here is the code to illustrate the problem:
/* emsgsize.c: test whether IP_PMTUDISC_PROBE suppresses EMSGSIZE
*
* Usage: emsgsize packet_size
*/
#include <arpa/inet.h>
#include <errno.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#define CHECK(w_, s_) do { if ((s_) < 0) { perror(w_); return 1; }} while (0)
/* Payload */
static unsigned char data[64 * 1024];
int
main (int argc, char **argv)
{
int fd, on, s, size;
struct sockaddr_in si;
ssize_t sent;
if (argc != 2)
{
fprintf(stderr, "usage: emsgsize size\n");
return 1;
}
size = atoi(argv[1]);
memset(&si, 0, sizeof(si));
si.sin_family = AF_INET;
fd = socket(si.sin_family, SOCK_DGRAM, 0);
CHECK("socket", fd);
s = bind(fd, (struct sockaddr *) &si, sizeof(si));
CHECK("bind", s);
/* This is supposed to suppress sendmsg(2) returning -1 with
* errno = EMSGSIZE, see ip(7):
*
" It is possible to implement RFC 4821 MTU probing with SOCK_DGRAM
" or SOCK_RAW sockets by setting a value of IP_PMTUDISC_PROBE
" (available since Linux 2.6.22). This is also particularly use-
" ful for diagnostic tools such as tracepath(8) that wish to de-
" liberately send probe packets larger than the observed Path MTU.
*/
on = IP_PMTUDISC_PROBE;
s = setsockopt(fd, IPPROTO_IP, IP_MTU_DISCOVER, &on, sizeof(on));
CHECK("setsockopt", s);
memset(&si, 0, sizeof(si));
si.sin_family = AF_INET;
si.sin_port = htons(12345); /* Destination does not matter */
s = inet_pton(AF_INET, "127.0.0.1", &si.sin_addr);
CHECK("inet_pton", s);
sent = sendto(fd, data, (size_t) size, 0, (struct sockaddr *) &si,
sizeof(si));
CHECK("sendto", sent);
return 0;
}
When I send packets larger than the MTU, sendto()
above returns -1 and errno
is set to EMSGSIZE
-- exactly what I want to avoid.
Is there a way to do what I want?
IP_PMTUDISC_PROBE
sets do-not-fragment flag (which causesEMSGSIZE
error to be returned for too long messages). UseIP_PMTUDISC_WANT
instead: that allows fragmenting the datagram, but does path MTU discovery too (and sets DF flag for datagrams that are not too long). I recommend you look at the officialman 7 ip
page at man7.org for the most accurate, up-to date documentation; they're well described there. – IvetteIP_PMTUDISC_DONT
. NoEMSGSIZE
errors, and messages exceeding MTU are just dropped. – IvetteIP_PMTUDISC_PROBE
was added. Changes innet/ipv4/ip_output.c
indicate that you can send packets without fragmentation larger than MTU to destination, but still not larger than device MTU. – Helium