The answer is Pagination. Use the top_size
-- max number of results or records in result -- in conjunction with next_partition_key
and next_row_key
the continuation tokens. That makes a significant even factorial difference in performance. For one, your results are statistically more likely to come from a single partition. Plain results show that sets are grouped by the partition continuation key and not the row continue key.
In other words, you also need to think about your UI or system output. Don't bother returning more than 10 to 20 results max 50. The user probably wont utilize or examine any more.
Sounds foolish. Do a Google search for "dog" and notice that the search returns only 10 items. No more. The next records are avail for you if you bother to hit 'continue'. Research has proven that almost no user ventures beyond that first page.
the select
(returning a subset of the key-values) may make a difference; for example, use select
= "PartitionKey,RowKey"
or 'Name'
whatever minimum you need.
"I believe, that the effect of crossing these boundaries also results
in continuation tokens, which require additional round-trips to
storage to retrieve the results. This results then in reducing
performance, as well as an increase in transaction counts (and
subsequently cost)."
...is slightly incorrect. the continuation token is used not because of crossing boundaries but because azure tables permit no more than 1000 results; therefore the two continuation tokens are used for the next set. default top_size is essentially 1000.
For your viewing pleasure, here's the description for queries entities from the azure python api. others are much the same.
'''
Get entities in a table; includes the $filter and $select options.
table_name: Table to query.
filter:
Optional. Filter as described at
http://msdn.microsoft.com/en-us/library/windowsazure/dd894031.aspx
select: Optional. Property names to select from the entities.
top: Optional. Maximum number of entities to return.
next_partition_key:
Optional. When top is used, the next partition key is stored in
result.x_ms_continuation['NextPartitionKey']
next_row_key:
Optional. When top is used, the next partition key is stored in
result.x_ms_continuation['NextRowKey']
'''