1: The problem with averages, for my purposes, is that they nuke large differences between samples- my peaks and valleys were more important than what was happening between them. The point of the 3buckets algorithm is to try to preserve those inflection points (peaks/valleys) while not worrying about showing you all the times the data was similar or the same.
So, in my case, where the data was generally all the same (or close enough-- temperature data) until sample X at which point a small % change was important to be shown in the graph, the buckets algorithm was perfect.
Also- since the buckets algorithm is parameterized, you can change the values (how much data to keep) and see what values nuke the most data while looking visually nearly-identical and decide how much data you can dispense with before your graph has had too much data removed.
The naive approach would be decimation (removing X out of N samples) but what happens if it's the outliers you care about and the algorithm nukes an outlier? So then you change your decimation so that if the difference is -too- great, then it doesn't nuke that sample. This is kind of a more sophisticated version of that concept.
2: depends on how quickly you can compute it all, if the data ever changes, various other factors. That's up to you. From my perspective, once my data was in the past and a sample was 'chosen' to represent the bucket's value, it won't be changed and I can save it and never recalculate again.
Since your question is a bit old, what'd you end up doing?