I am working with the Xilinx distribution of Linux on a Zynq 7000 board. This has two ARM processors, some L2 cache, a DRAM interface, and a large amount of FPGA fabric. Our appliance collects data being processed by the FPGA and then sends it over the gigabit network to other systems.
One of the services we need to support on this appliance is SNMP, which relies on UDP datagrams, and although SNMP does have TCP support, we cannot force the clients to use that.
What I am finding is that this system is losing almost all SNMP requests.
It is important to note that neither the network nor the CPUs are being overloaded. The data rate isn't particularly high, and the CPUs are usually somewhere around 30% load. Plus, we're using SNMP++ and Agent++ libraries for SNMP, so we have control over those, so it's not a problem with a system daemon breaking. However, if we do stop the processing and network activity, SNMP requests are not lost. SNMP is being handled in its own thread, and we've made sure to keep requests rare and spread-out so that there really should be no more than one request buffered at any one time. With the low CPU load, there should be no problem context-switching to the receiving process to service the request.
Since it's not a CPU or ethernet bandwidth problem, my best guess is that the problem lies in the Linux kernel. Despite the low network load, I'm guessing that there are limited network stack buffers being overfilled, and this is why it's dropping UDP datagrams.
When googling this, I find examples of how to use netstat to report lost packets, but that doesn't seem to work on this system, because there is no "-s" option. How can I monitor these packet drops? How can I diagnose the cause? How can I tune kernel parameters to minimize this loss?
Thanks!
tcpdump
then to see if the UDP packets are available in the kernel. You can install other linux utils through the board's flash. Also, UDP doesn't have guaranteed delivery (I'm not sure if SNMP has its own retry logic ontop). – Senate