How do I get all the keys that are stored in the Cassandra column family with pycassa?

Z

5

6

Is anyone having experience working with pycassa I have a doubt with it. How do I get all the keys that are stored in the database?

well in this small snippet we need to give the keys in order to get the associated columns (here the keys are 'foo' and 'bar'),that is fine but my requirement is to get all the keys (only keys) at once as Python list or similar data structure.

cf.multiget(['foo', 'bar'])
{'foo': {'column1': 'val2'}, 'bar': {'column1': 'val3', 'column2': 'val4'}}

Thanks.

Zeuxis answered 12/3, 2010 at 4:39 Comment(0)

R

11

try:

    list(cf.get_range().get_keys())

more good stuff here: http://github.com/vomjom/pycassa

Reger answered 29/3, 2010 at 20:8 Comment(1)

seems with recent pycassa api changed a bit, but this works: [x[0] for x in col_fam.get_range()] – Carnation 10/3, 2014 at 21:18

H

5

You can try: cf.get_range(column_count=0,filter_empty=False).

# Since get_range() returns a generator - print only the keys.
for value in cf.get_range(column_count=0,filter_empty=False):
    print value[0]

Heterosexual answered 15/10, 2012 at 13:10 Comment(0)

S

1

get_range([start][, finish][, columns][, column_start][, column_finish][, column_reversed][, column_count][, row_count][, include_timestamp][, super_column][, read_consistency_level][, buffer_size])

Get an iterator over rows in a specified key range.

http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get_range

Scutiform answered 17/3, 2011 at 21:15 Comment(0)

L

1

Minor improvement on Santhosh's solution

dict(cf.get_range(column_count=0,filter_empty=False)).keys()

If you care about order:

OrderedDict(cf.get_range(column_count=0,filter_empty=False)).keys()

get_range returns a generator. We can create a dict from the generator and get the keys from that.

column_count=0 limits results to the row_key. However, because these results have no columns we also need filter_empty.

filter_empty=False will allow us to get the results. However empty rows and range ghosts may be included in our result now.

If we don't mind more overhead, getting just the first column will resolve the empty rows and range ghosts.

dict(cf.get_range(column_count=1)).keys()

Lambent answered 25/7, 2013 at 14:52 Comment(0)

H

0

There's a problem with Santhosh's and kzarns' answers, as you're bringing in memory a potentially huge dict that you are immediately discarding. A better approach would be using list comprehensions for this:

keys = [c[0] for c in cf.get_range(column_count=0, filter_empty=False)]

This iterates over the generator returned by get_range, keeps the key in memory and stores the list.

If the list of keys where also potentially too large to keep it in memory all at once and you only need to iterate once, you should use a generator expression instead of a list comprehension:

kgen = (c[0] for c in cf.get_range(column_count=0, filter_empty=False))
# you can iterate over kgen, but do not treat it as a list, it isn't!

Harlie answered 12/2, 2015 at 20:30 Comment(0)

Recommended topics

Hot tags