I get this error randomly due to issues with network connectivity, but retrying several times usually fixes it. Here's the code I use to retry Hector API functions:
/** An interface where inside the execute() method I call Hector */
public interface Retriable<T> {
T execute();
}
/**
* Executes operation and retries N times in case of an exception
* @param retriable
* @param maxRetries
* @param <T>
* @return
*/
public static <T> T executeWithRetry(Retriable<T> retriable, int maxRetries) {
T result;
int retries = 0;
long sleepSec = 1;
// retry in case of an exception:
while (true) {
try {
result = retriable.execute();
break;
} catch (Exception e) {
if (retries == maxRetries) {
LOG.error("Exception occurred. Reached max retries.", e);
throw e;
}
retries++;
LOG.error(String.format("Exception occurred. Retrying in %d seconds - #%d", sleepSec, retries), e);
try {
Thread.sleep(sleepSec * 1000);
// increase sleepSec exponentially:
sleepSec *= 2;
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
return result;
}
And an example on how to use it:
ColumnFamilyResult<String, String> columns = executeWithRetry(new Retriable<ColumnFamilyResult<String, String>>() {
@Override
public ColumnFamilyResult<String, String> execute() {
return template.queryColumns(row.getKey());
}
});