Auto-increment on Azure Table Storage
Asked Answered
G

5

24

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.

Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).

There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.

However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?

Gaud answered 8/12, 2009 at 22:28 Comment(2)
If your really worried on it, have a table in SQL azure generating the identity values?Adventurous
That's a pretty good suggestion. However that would force me to setup my SQL-server, because Azure SQL Data Services doesn't support identity-columns: shanmcarthur.net/cloud-services/… .Gaud
S
5

I haven't implemented this yet but am working on it ...

You could seed a queue with your next ids to use, then just pick them off the queue when you need them.

You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.

You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.

If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)

  1. lock
  2. get the last key created from the table
  3. increment and save
  4. unlock

then use the new value.

Shelia answered 9/12, 2009 at 13:24 Comment(3)
But how does a queue guarantee that duplicate id's are not created? What I can understand from download.microsoft.com/download/5/2/D/… is that a message is added to the queue again if a worker process fails while processing the queue message. You therefore need to make the job on the worker role idempotent. If the same message (i.e. the same ID) is used by two different worker roles I don't see how you can make that idempotent.Gaud
If you have only 1 woker creating the id's then dups would be put in the queue. When pulling the ids out of the queue, get the message, then delete the message before using the message's contents (id). That should ensure no id's are used more than once. Seems like the worse case scenario then would be you may loose a key, but your uniqueness should still be good.Shelia
The second sentence above should be: "If you have only 1 woker creating the id's then dups wouldn't be put in the queue ..."Shelia
K
34

For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!

Create one entity in table for ID (you can even name it as ID or whatever).

1) Read it.

2) Increment.

3) InsertOrUpdate WITH ETag specified (from the read query).

if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3. The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.

Kibbutznik answered 12/3, 2015 at 3:15 Comment(2)
what about using EGTs to guarantee atomic operations, read and increment? Is that a viable way to do it? learn.microsoft.com/en-gb/azure/storage/…Demisec
You are sayin that Automatic increment is not implement?Windham
C
8

See UniqueIdGenerator class by Josh Twist.

Carrasquillo answered 22/5, 2011 at 23:12 Comment(2)
That blog post is amazing!Slicer
The code from the article can now be found here: learn.microsoft.com/en-us/archive/msdn-magazine/2010/november/…Hopping
S
5

I haven't implemented this yet but am working on it ...

You could seed a queue with your next ids to use, then just pick them off the queue when you need them.

You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.

You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.

If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)

  1. lock
  2. get the last key created from the table
  3. increment and save
  4. unlock

then use the new value.

Shelia answered 9/12, 2009 at 13:24 Comment(3)
But how does a queue guarantee that duplicate id's are not created? What I can understand from download.microsoft.com/download/5/2/D/… is that a message is added to the queue again if a worker process fails while processing the queue message. You therefore need to make the job on the worker role idempotent. If the same message (i.e. the same ID) is used by two different worker roles I don't see how you can make that idempotent.Gaud
If you have only 1 woker creating the id's then dups would be put in the queue. When pulling the ids out of the queue, get the message, then delete the message before using the message's contents (id). That should ensure no id's are used more than once. Seems like the worse case scenario then would be you may loose a key, but your uniqueness should still be good.Shelia
The second sentence above should be: "If you have only 1 woker creating the id's then dups wouldn't be put in the queue ..."Shelia
S
4

The solution I found that prevents duplicate ids and lets you autoincrement it is to

  1. lock (lease) a blob and let that act as a logical gate.

  2. Then read the value.

  3. Write the incremented value

  4. Release the lease

  5. Use the value in your app/table

Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.

Here is a code sample and more information on this approach from Steve Marx

Scabbard answered 8/10, 2011 at 5:52 Comment(0)
L
3

If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.

Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.

Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).

Lawlor answered 15/12, 2009 at 3:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.