(Twitter) Storm's Window On Aggregation
Asked Answered
T

2

7

I'm playing around with Storm, and I'm wondering where Storm specifies (if possible) the (tumbling/sliding) window size upon an aggregation. E.g. If we want to find the trending topics for the previous hour on Twitter. How do we specify that a bolt should return results for every hour? Is this done programatically inside each bolt? Or is it some way to specify a "window" ?

Transceiver answered 26/9, 2012 at 14:25 Comment(0)
M
17

Disclaimer: I wrote the Trending Topics with Storm article referenced by gakhov in his answer above.

I'd say the best practice is to use the so-called tick tuples in Storm 0.8+. With these you can configure your own spouts/bolts to be notified at certain time intervals (say, every ten seconds or every minute).

Here's a simple example that configures the component in question to receive tick tuples every ten seconds:

// in your spout/bolt
@Override
public Map<String, Object> getComponentConfiguration() {
    Config conf = new Config();
    int tickFrequencyInSeconds = 10;
    conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, tickFrequencyInSeconds);
    return conf;
}

You can then use a conditional switch in your spout/bolt's execute() method to distinguish "normal" incoming tuples from the special tick tuples. For instance:

// in your spout/bolt
@Override
public void execute(Tuple tuple) {
    if (isTickTuple(tuple)) {
        // now you can trigger e.g. a periodic activity
    }
    else {
        // do something with the normal tuple
    }
}

private static boolean isTickTuple(Tuple tuple) {
    return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
        && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}

Again, I wrote a pretty detailed blog post about doing this in Storm a few days ago as gakhov pointed out (shameless plug!).

Mulligrubs answered 6/2, 2013 at 18:38 Comment(3)
We ended up using tick tuples for "triggering" an aggregating function (bolt). Thanks a bunch=)Transceiver
Hi Michael, I am wondering about this: while storm is running, can I somehow change the frequency of the tick tuples? If we can, we can change the frequency at which storm will write the log of the trending results, or it can change the window size at which storm is calculating the topic trend. Thanks!Oneself
AFAIK you can't change the tick frequency at runtime.Mulligrubs
N
1

Add a new spout with parallelism degree of 1, and have it emit an empty signal and then Utils.sleep until next time (all done in nextTuple). Then, link all relevant bolts to that spout using all-grouping, so all of their instances will receive that same signal.

Navvy answered 2/11, 2012 at 23:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.