Cassandra Hector: How to retrieve all rows of a column family?

Asked 7/12, 2011 at 16:8 Answered 6/11, 2013 at 12:29

I am looking for a code example to retrieve all rows and all columns of a column family. Something like:

SELECT * FROM MyTable

I see that this can be done using a RangeSlicesQuery, but you still have to provide a certain range. And I think you have to specify the column names too. Is there a clean and safe way to do this?

Using Hector 1.0 and Cassandra 1.0.

Sula answered 7/12, 2011 at 16:8 Comment(0)

Try something like this:

public class Dumper {
    private final Cluster cluster;
    private final Keyspace keyspace;

    public Dumper() {
        this.cluster = HFactory.getOrCreateCluster("Name", "hostname");
        this.keyspace = HFactory.createKeyspace("Keyspace", cluster, new QuorumAllConsistencyLevelPolicy());
    }

    public void run() {
        int row_count = 100;

        RangeSlicesQuery<UUID, String, Long> rangeSlicesQuery = HFactory
            .createRangeSlicesQuery(keyspace, UUIDSerializer.get(), StringSerializer.get(), LongSerializer.get())
            .setColumnFamily("Column Family")
            .setRange(null, null, false, 10)
            .setRowCount(row_count);

        UUID last_key = null;

        while (true) {
            rangeSlicesQuery.setKeys(last_key, null);
            System.out.println(" > " + last_key);

            QueryResult<OrderedRows<UUID, String, Long>> result = rangeSlicesQuery.execute();
            OrderedRows<UUID, String, Long> rows = result.get();
            Iterator<Row<UUID, String, Long>> rowsIterator = rows.iterator();

            // we'll skip this first one, since it is the same as the last one from previous time we executed
            if (last_key != null && rowsIterator != null) rowsIterator.next();   

            while (rowsIterator.hasNext()) {
              Row<UUID, String, Long> row = rowsIterator.next();
              last_key = row.getKey();

              if (row.getColumnSlice().getColumns().isEmpty()) {
                continue;
              }


              System.out.println(row);
            }

            if (rows.getCount() < row_count)
                break;
        }
    }

    public static void main(String[] args) {
        new Dumper().run();
    }
}

This will page through the column family in pages of 100 rows. It will only fetch 10 columns for each row (you will want to page very long rows too).

This is for a column family with uuids for row keys, strings for column names and longs for values. Hopefully it should be obvious how to change this.

Shamekashameless answered 7/12, 2011 at 16:49 Comment(5)

Thanks for your answer. But this is what I have done. I simply set rangeSlicesQuery.setKeys("", "") and I do not set any row count. This returned all the rows in the column family. It seems there is no need to page through the columns. – Sula 7/12, 2011 at 17:46

To continue with my previous comment, to do it like that, I needed to specify the column names. – Sula 7/12, 2011 at 17:54

I pretty sure Hector does not implements paging for you. Your code will likely fail with a timeout (or worse, cause Cassandra to OOM) when you dataset gets larger, as doing what you suggest causes Cassandra to load the entire dataset into RAM. – Shamekashameless 7/12, 2011 at 18:23

This might only work with order preserving partitioner. So how can you do it with RandomPartitioner? – Collocate 8/6, 2012 at 10:26

We tried it with 100k rows and eventually started to timeout. – Charlesettacharleston 25/6, 2012 at 17:58

Try this out:

    int rowCount = MAX;
    RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory
            .createRangeSlicesQuery(keyspace2, STRINGSERIALIZER,
                    STRINGSERIALIZER, STRINGSERIALIZER)
            .setColumnFamily(columnFamily)
            .setRange(null, null, false, rowCount).setRowCount(rowCount);
    String lastKey = null;
    // Query to iterate over all rows of cassandra Column Family
    rangeSlicesQuery.setKeys(lastKey, null);
    QueryResult<OrderedRows<String, String, String>> result = rangeSlicesQuery
            .execute();
    OrderedRows<String, String, String> rows = result.get();
    for (Row<String, String, String> row : rows) {
        String cassandra_key = row.getKey();
    }

}

Melt answered 6/11, 2013 at 12:29 Comment(0)

Recommended topics

Hot tags