BigTable: Improve BigTable queries performance using the Python client

We are facing some performance issues while querying BigTable.

We are getting about 500 rows/sec in the monitoring dashboard when querying for about 1400 rows, and it takes about 1.6 seconds when running the next snippet in a machine in the same region as the BigTable instance:

partial_rows = instance.table(table_name).read_rows(filter_=row_filter, row_set=row_set)

ids = {}
for row in partial_rows:
    key = row.row_key.decode()
    category = key[0:5]
    id_, year_month = key[5:].split('_')
    dates = ids.get(id_, {})

    for k, v in row.cells[column_family].items():
        sub_key = k.decode()
        day = sub_key[0:2]
        measurement = f'{category}_{sub_key[3:]}'

        date = datetime.date(int(year_month[0:4]), int(year_month[4:6]), int(day))
        value = struct.unpack('>i', v[0].value)[0]

        measurements = dates.get(date, {})
        measurements[measurement] = value
        dates[date] = measurements

    ids[id_] = dates

The table schema we are using is:

Row key: {category}{id}_{year}{month}
Column: {day}{measurement_name}

In our case, this schema completely follows BigTable's guidelines.

The snippet is very simple, we perform some operations with the key and the column name for creating an ids dictionary that looks like this:

{
  "4326": {
    "2019-01-01": {
      "value_a": 49
    },
    "2019-01-02": {
      "value_a": 53
    },
    ...
  },
  "3857": {
    "2019-01-01": {
      "value_a": 56
    },
    "2019-01-02": {
      "value_a": 59
    },
    ...
  },
  ...
}

Do you have any idea why we are getting such low read throughput?

Is there any way to obtain all the data at once and not using the read_rows streamed generator? Or even other solutions with the same Python client or another one, like something asynchronous.

Recommended topics

Hot tags