Which StatsD client should I use for a java/grails project?
Asked Answered
L

3

20

I'm looking at adding StatsD data collection to my grails application and looking around at existing libraries and code has left me a little confused as to what would be a good scalable solution. To put the question into context a little I'm working on an online gaming type project where I will naturally be monitoring user interactions with the game engine, these will naturally cluster around particular moments in time where X users will be performing interactions within the window of a second or two, then repeating after a 10-20 second pause.

Here is my analysis of the options that are available today.

Etsy StatsD client example

https://github.com/etsy/statsd/blob/master/examples/StatsdClient.java

The "simplest thing that could possibly work" solution, I could pull this class into my project and instanciate a singleton instance as a spring bean and use it directly. However after noticing that the grails-statsd plugin creates a pool of client instances I started wondering about the scalability of this approach.

It seems that the doSend method could become a bottleneck if many threads are trying to send events at the same time, however as I understand it, due to the fire and forget nature of sending UDP packets, this should happen quickly, avoiding the huge overhead that we usually associate with network connections.

grails-statsd plugin

https://github.com/charliek/grails-statsd/

Someone has already created a StatsD plugin for grails that includes some nice features, such as the annotations and withTimer method. However I see that the implementation there is missing some bug fixes from the example implementation such as specifying the locale on calls to String.format. I'm also not a huge fan of pulling in apache commons-pool just for this, when a standard Executor could achieve a similar effect.

java-statsd-client

https://github.com/tim-group/java-statsd-client/

This is an alternative pure java library that operates asynchronously by maintaining its own ExecutorService. It supports the entire StatsD API, including sets and sampling, but doesn't provide any hooks for configuring the thread pool and queue size. In the case of problems, for non-critical things such as monitoring, I think I would prefer a finite queue and losing events than having an infinite queue that fills up my heap.

Play statsd plugin

https://github.com/vznet/play-statsd/

Now I can't use this code directly in my grails project but I thought it was worth a look to see how things were implemented. Generally I love the way the code in StatsdClient.scala is built up, very clean and readable. Also appears to have the locale bug, but otherwise feature complete with the etsy sample. Interestingly, unless there is some scala magic that I've not understood, this appears to create a new socket for each data point that is sent to StatsD. While this approach nicely avoids the necessity for an object pool or executor thread I can't imagine it's terribly efficient, potentially performing DNS lookups within the request thread that should be returning to the user as soon as possible.

The questions

  1. Judging by the fact that all the other implementations appear to have implemented another strategy for handling concurrency, can I assume that the Etsy example is a little too naïve for production use?
  2. Does my analysis here appear to be correct?
  3. What are other people using for statsd in java/groovy?

So far it looks like the best existing solution is the grails plugin as long as I can accept the commons-pool dependency, but right now I'm seriously considering spending Sunday writing my own version that combines the best parts of each implementation.

Lavernlaverna answered 21/6, 2013 at 19:44 Comment(0)
L
1

After sleeping on this for a week I think I'm going to go ahead and use the existing grails StatsD plugin. The rationale for this being that although I could achieve a similar effect using an Executor for handling concurrency, without using an object pool this would still be bound to a single client/socket instance, in theory a rather obvious bottleneck in the application. Therefore if I need a pool anyway, I may as well use one where someone else has done all the hard work :)

Lavernlaverna answered 27/6, 2013 at 6:56 Comment(1)
And if you find a need to make the plugin better, there's more likely going to be an easier path to give back to the community in a more useful way!Royceroyd
C
10

Speaking as the primary committer of the java-statsd-client, as well as someone who uses this library in production, I'd like to attempt to allay your fears regarding "having an infinite queue that fills up my heap."

I think you pretty much nailed it with your analysis of the Etsy StatsD client example when you said "due to the fire and forget nature of sending UDP packets, this should happen quickly, avoiding the huge overhead that we usually associate with network connections."

It is my understanding that, the way that the java-statsd-client is currently implemented, the constraint for the build-up of a large queue of outbound messages is the speed of fire-and-forget UDP packet sending. I'm not an expert in this area, but I'm unaware of any way in which this could block such that an infinite queue might build up.

When you originally did your evaluation, there were a number of outstanding issues with the java-statsd-client (e.g. Locale/character encoding ambiguities, and a lack of sampling support), but these have recently been addressed. What remains is the question of whether there is a genuine risk of filling up the heap. I'd be keen to hear thoughts from the community on this matter, and, if the consensus is that there is an issue, I would be delighted to explore the introduction of a limiting queue into the library.

Campney answered 28/7, 2014 at 15:39 Comment(1)
For the record, we ran into a problem using the original java-statsd-client library whereby in certain circumstances it would cause our production instances to become completely unresponsive. This would happen during periods of high load and coincide with garbage collection. Essentially GC used enough CPU that the statsd client could no longer send all of the messages building up in the queue. The enqueued messages then consumed all available heap space and we'd go into a GC death spiral from which we couldn't recover. We switched to java-dogstatsd-client and a nice bounded queue.Froude
L
1

After sleeping on this for a week I think I'm going to go ahead and use the existing grails StatsD plugin. The rationale for this being that although I could achieve a similar effect using an Executor for handling concurrency, without using an object pool this would still be bound to a single client/socket instance, in theory a rather obvious bottleneck in the application. Therefore if I need a pool anyway, I may as well use one where someone else has done all the hard work :)

Lavernlaverna answered 27/6, 2013 at 6:56 Comment(1)
And if you find a need to make the plugin better, there's more likely going to be an easier path to give back to the community in a more useful way!Royceroyd
P
1

I came across StatsD over SLF4J during a similar search for a pure Java StatsD client and compared it to Java StatsD Client, which you mentioned had several issues. Just based on reading the source, I came up with this breakdown relating to the issues.

EDIT: the table below has been updated for version 3.0.1 of java-statsd-client in which many of the original issues have been addressed.

                          |   java-statsd-client   |   statsd-over-slf4j
——————————————————————————+————————————————————————+————————————————————
messages support sampling |          yes           |        yes
——————————————————————————+————————————————————————+————————————————————
actual sampling performed |   no, left to caller   | yes, using java.util.Random
——————————————————————————+————————————————————————+————————————————————
nonblocking impl worker   |  single daemon thread  | single daemon thread
——————————————————————————+————————————————————————+————————————————————
nonblocking impl queue    |       unbounded        | caller-specified bound
——————————————————————————+————————————————————————+————————————————————
String.format locale      |         none*          |     Locale.US
——————————————————————————+————————————————————————+————————————————————
charset for message bytes |        UTF-8**         | default, can be overridden

* no localisation is applied
** this is the charset that StatsD reads with
Parallelize answered 7/5, 2014 at 22:22 Comment(5)
@Campney Thanks for the updates. I'd also like to update "sampling supported" as it looks like you addressed that too. However, I don't see the sampling itself happening in NonBlockingStatsDClient, just the message formatting. See ubercraft's implementation here for example. Should I log an issue or am I missing something?Parallelize
In my view sampling is now supported. I do not believe it is the library's responsibility to actually perform the sampling, as the method of doing so will be application specific (use of randomness like the example you link is opaque, and non-deterministic). Feel free to open an issue on GitHub where we can continue to discuss this more openly, but in the meantime I don't think it's fair to say "no, open pull request" against java-statsd-client sampling support.Campney
@Campney That confuses me because I thought the point of sampling was to reduce traffic between the client and StatsD. I'll open an issue when I get time. Anyway, it's exciting to see your project active again.Parallelize
Thanks for your interest @paul-bellora, I'm glad to see some excitement about this library. I'm sorry I wasn't clear about sampling: I totally agree that it is to reduce traffic between the client and the StatsD server -- I just think that the application should sample its data before calling into the statsd-client-library, and hence the role of the client library is simply to allow an application to report what sampling ratio, if any, it has used.Campney
@Campney Oh, that makes sense. I'll update the answer to reflect that.Parallelize

© 2022 - 2024 — McMap. All rights reserved.