"Lost" UDP packets (JBoss + DatagramSocket)
Asked Answered
D

4

5

I develop part of some JBoss+EJB based enterprise application. My module needs to process huge amount of incoming UDP packets. I've done some load testing and it looks that in case of sending packets with 11ms interval everything is fine, but in case of 10ms interval some packets are lost. It's rather strange in my opinion, but I done 10/11ms interval load tests comparison several times and it is always the same result (10 ms - some "lost" packets, 11ms - everything's fine).

If it was something wrong with synchronization, I'd expect that it will also be visible in case of 11ms tests (at least one packet lost, or at least one wrong counter value). So if it is not because of synchronization, then maybe DatagramSocket through which I receive packets doesn't work as expected.

I found that receive buffer size (SO_RCVBUF) has default 57344 value (probably it's underlying IO network buffers dependent). I suspect, that maybe when this buffer goes full, then new incoming UDP datagrams are rejected. I tried set this value to some higher, but I noticed that if I exaggerate, buffer returns to its default size. If it's underlying layer dependent how can I find out maximum buffer size for certain OS/network card from JBoss level?

Is it possible that it is caused by receive buffer size, or maybe 57344 value is big enough to handle most cases? Do you have any experience with such issues?

There is no timeout set on my DatagramSocket. My UDP datagrams contains about 70 bytes of data (value without datagram header included).

[Edited] I have to use UDP because I receive Cisco Netflow data - it is protocol used by network devices to send some traffic statistics. Also, I have no influence on sent bytes format (e.g. I cannot add counters for packets and so on). It is not expected that all packets will be processed (some datagrams may be lost), but I'd expect that I will process most of packets. During 10ms interval tests, about 30% of packets were lost.

It is not very possible that slow processing causes this issue. Currently singleton component holds reference to DatagramSocket calling receive method in a loop. When packet is received, it is passed to the queue, and processed by picked from pool stateless component. "Facade" Singleton is only responsible for receiving packets and passing it on to the processing (it does not wait for processing complete event).

Thanks in advance, Piotr

Dugong answered 23/2, 2011 at 18:34 Comment(4)
Why do you need UDP? I would use TCP until you have a profiling that indicates a saturated layer with a need to go to UDP. Also, my experience with UDP is that the data is usually duplicated. "Here's the current state." So don't worry if you miss this packet, because another packets coming soon!Quito
"It is not very possible that slow processing causes this issue." --I think it is, because 10 msec is small, about the same duration as a thread quantum.Norbert
That 10ms may vary on different operating systems/processors. Can you send the data less frequently, say 100ms?Wolenik
@Norbert Maybe you are right, but if so, shouldn't packet losses be seen also in case of 11ms interval? (I cannot observer any) @Jeff Storey It only depends on external network devices how fast packets will be sent. I am only responsible for receiving and processing.Dugong
H
3

UDP is inherently unreliable.

Datagrams can be thrown away at any point between sender and receiver, even within the receiver at a level below your code. Setting the recv buffer to a larger size is likely to help the networking code within your machine buffer more datagrams but you should expect that some datagrams will be lost anyway.

If your recv logic takes too long (i.e. longer than it takes for a new datagram to arrive) then you'll always be behind and you'll always miss datagrams eventually. All you can do is make sure that your recv code runs as fast as possible, perhaps move the inbound datagram to a queue and process it 'later' or on another thread but then that will just move your problem to being one where you have a queue that keeps growing.

[Re your edit...] And what's processing your queue and how does the locking work between the producer and the consumers? Change your code so that the recv logic simply increments a count and discards the data and loops back around and see if you're losing fewer datagrams; either way, UDP is unreliable, you WILL have datagrams that are discarded and you should just expect that and deal with it. Worrying about it means you're focusing on the wrong problem; make use of the data you DO get and assume that you wont get much of it and then your program will work even if the network gets congested and MOST of your datagrams get discarded.

In summary, that's just how is it with UDP.

Hwahwan answered 23/2, 2011 at 19:3 Comment(1)
Thanks for answer. Producer holds a reference to DatagramSocket and producer is JBossEJB3ext @Service bean (singleton). It calls in while(true) loop blocking receive method. When it receives packet, DatagramSocket is passed into one of pooled stateless beans via AsyncUtils (asynchronous call). I will try to disable processing and only count incoming packets, then I'll give feedback.Dugong
W
5

UDP does not guarantee delivery, so you can tweak parameters, but you can't guarantee that the message will get delivered, especially in the case of very large data transfers.

If you need to guarantee delivery, you should use TCP instead.

If you need (or want) to use UDP, you can encode each packet with a number, and also send the number of packets expected. For example, if you sent 10 large packets, you could include the information: packet 1/10, packet 2/10, etc. This way you can at least tell if you have not received all of the packets. If you have not received them, you could send a request to resend those missing packets.

Wolenik answered 23/2, 2011 at 18:40 Comment(2)
Yes, as soon as you have large transfers, the packet will go over the MTU. In that case, the packet becomes fragmented. With UDP, it is not guaranteed that the fragments are even received in order, but the underlying protocol (IP) will discard a packet if this is not the case, so UDP will never receive a packet on its own. This is why big packets with UDP are far less reliable -- there is more discarding going on behind the scenes as UDP preserves boundaries :)Fortunna
Thanks for answers, unfortunately I don't have influence on how packets are generated (also I cannot switch to TCP) - please take a look at my edited post.Dugong
H
3

UDP is inherently unreliable.

Datagrams can be thrown away at any point between sender and receiver, even within the receiver at a level below your code. Setting the recv buffer to a larger size is likely to help the networking code within your machine buffer more datagrams but you should expect that some datagrams will be lost anyway.

If your recv logic takes too long (i.e. longer than it takes for a new datagram to arrive) then you'll always be behind and you'll always miss datagrams eventually. All you can do is make sure that your recv code runs as fast as possible, perhaps move the inbound datagram to a queue and process it 'later' or on another thread but then that will just move your problem to being one where you have a queue that keeps growing.

[Re your edit...] And what's processing your queue and how does the locking work between the producer and the consumers? Change your code so that the recv logic simply increments a count and discards the data and loops back around and see if you're losing fewer datagrams; either way, UDP is unreliable, you WILL have datagrams that are discarded and you should just expect that and deal with it. Worrying about it means you're focusing on the wrong problem; make use of the data you DO get and assume that you wont get much of it and then your program will work even if the network gets congested and MOST of your datagrams get discarded.

In summary, that's just how is it with UDP.

Hwahwan answered 23/2, 2011 at 19:3 Comment(1)
Thanks for answer. Producer holds a reference to DatagramSocket and producer is JBossEJB3ext @Service bean (singleton). It calls in while(true) loop blocking receive method. When it receives packet, DatagramSocket is passed into one of pooled stateless beans via AsyncUtils (asynchronous call). I will try to disable processing and only count incoming packets, then I'll give feedback.Dugong
L
2

It appears in your tests that only up to two packets can be in the buffer so if each packet is less than 28KB this should be fine.

As you know UDP is lossy, but you should be able to send more than one packet per 10 ms. I suggest you write a simple receiver which just listens to packets just to determine if its your application or something at the network/OS level. (I suspect the later)

Lefthander answered 23/2, 2011 at 18:41 Comment(11)
is there any mehtod to know Buffer Overflow. Because i am receiving packets continuously and after long operation time its stops receiving. Is it because of bufferoverflow.Evictee
@GeorgeThomas the receiver has no idea whether or why packets are lost. You can only determine this through experimentation. e.g. play with the size or distance between packets.Lefthander
I have no idea! i have implemented a udp listener in android and i receive packet every 1 sec. Everthing works fine but after working for some time it completely stop. I will try to change the size and distance and give a tryEvictee
@GeorgeThomas You might need to test this on a where you can monitor the UDP packets. e.g. using wireshark. There is no good reason why UDP packets should just stop. You should only see random packets being lost under load.Lefthander
What happens when you attempt to reconnect? What happens when you connect multiple times on different ports for example? Do they all stop at once, or does just one stop?Lefthander
when i kill the app and reopens the app it works again. Is there any way we could know that the connection is lost and reconnect? I have not tried on different ports.Evictee
I will try wiresharkEvictee
@GeorgeThomas UDP has no notion of a connection so there is nothing to detect you are not getting packets. What you can do is add a heartbeat and if you don't receive any packets for a period of time assume there is a problem. What you should do depends on why it failed.Lefthander
@GeorgeThomas Wireshark will help you determine if your server has stopped sending packets due to a bug on the server (which I suspect is quite likely) Note: once you have fixed this you ill need to a way to determine when the server can stop sending packets as you will not receive a notification it is gone.Lefthander
Server is still sending i guess because ios app is receiving the packets sucessfully, so i dont think its the problem with the server.Evictee
Yes i will have to a way to determine when the server can stop sending packets, while trying to reconnectEvictee
N
1

I don't know Java but ... does the API allow you to invoke an asynch listen/receive for a datagram:

  • Use O/S API to do a receive (passing your application-level buffer as a paremeter)
  • (Wait while there's nothing to receive...)
  • (O/S receives something from the network...)
  • O/S puts the received packet into the buffer and completes/returns your API call

If that's true then I suggest that you do several concurrent instances of the API call, so that there are several concurrent application-level buffers into which multiple packets can be received.

Norbert answered 23/2, 2011 at 18:47 Comment(2)
Thanks for the answer. Currently, singleton facade is used - it holds reference to the DatagramSocket. It calls receive() method in a loop - it's blocking operation. If datagram is received, then it is passed to the processing component (pooled stateless beans). Container manages synchronization. This approach in my opinion protects against synchronization issues and also gives scalability. There is only one DatagramSocket instance allowed (bound per IP and port).Dugong
@Piotrek I'm suggesting you try changing that: instead of one synchronous/blocking operation, try doing multiple asynchronous/non-blocking operations ... so that the O/S has multiple concurrent application-level buffers into which to receive multiple packets.Norbert

© 2022 - 2024 — McMap. All rights reserved.