What's the Hi/Lo algorithm?
Asked Answered
A

5

508

What's the Hi/Lo algorithm?

I've found this in the NHibernate documentation (it's one method to generate unique keys, section 5.1.4.2), but I haven't found a good explanation of how it works.

I know that Nhibernate handles it, and I don't need to know the inside, but I'm just curious.

Armor answered 11/11, 2008 at 20:53 Comment(1)
"Cooperative ID generation" is so much better name for it than "Hi/Lo"Usurp
S
591

The basic idea is that you have two numbers to make up a primary key- a "high" number and a "low" number. A client can basically increment the "high" sequence, knowing that it can then safely generate keys from the entire range of the previous "high" value with the variety of "low" values.

For instance, supposing you have a "high" sequence with a current value of 35, and the "low" number is in the range 0-1023. Then the client can increment the sequence to 36 (for other clients to be able to generate keys while it's using 35) and know that keys 35/0, 35/1, 35/2, 35/3... 35/1023 are all available.

It can be very useful (particularly with ORMs) to be able to set the primary keys on the client side, instead of inserting values without primary keys and then fetching them back onto the client. Aside from anything else, it means you can easily make parent/child relationships and have the keys all in place before you do any inserts, which makes batching them simpler.

Sy answered 11/11, 2008 at 20:57 Comment(22)
Are you saying that "low ranges" are coordinated within the client, while the "high sequence" corresponds to a DB sequence?Bowknot
Are the hi & lo values typically then composed into a single integer value, or as a two-part business key?Bowknot
like an IP address then - ICANN gives you a high 'network' number, you then have as many low 'host' numbers as you like, within the limit of the CIDR range you're given.Final
What is the advantage of the hi/lo algorithm over simply having clients draw batches of keys from the central sequence in bulk?Doubler
@Adam: Fundamentally, nothing - it's just potentially cheaper to increment one value (the "high" part) than to generate a bunch of keys. (It's potentially much cheaper in terms of data transfer - you can "reserve" a huge number of keys with minimal bandwidth.)Sy
@Jon: True. Though in practice, clients need not transfer the batch of keys. The central sequence could merely be incremented by a larger quantity, effectively reserving those keys for the client.Doubler
@Adam: That's true if the keys are just numbers. Not so much for GUIDs :) But yes, in the case of simple numbers, any atomic "increment by a fixed amount" will do. That's effectively what hi-lo is doing, if you think of it as one number split into two sections.Sy
@Jon: Yes, that's effective what hi-lo is doing. You're point about non-numeric keys finally made things click for me. How naive of me, assuming keys are numbers. :)Doubler
So it's just a particular way to assign the client a range of primary keys instead of a single primary key, which is desirable for performance reasons?Clemenciaclemency
@Strilanc: That's my understanding, yes. Of course, there could well be subtleties that I'm missing :)Sy
@Jon In the example sequence- 35/0, 35/1, 35/2, 35/3... 35/1023, Shouldn't be it 36 instead of 35?Cornucopia
@HanuAthena When Hi is equals to 35 it means that 35 is first unoccupied sequence. All sequences below 35 are already used by other clients. By incrementing Hi value the client reserves 35 to self and notifies other clients that 36 become first unoccupied sequence.Scyros
How is the key stored in the db? And I assume the hi number seed for the next user would be stored in some reference table and incremented when requested?Derosier
@CoryHouse: It would be stored in the DB as the complete value, either as one number (the high bits and low bits squashed into one larger field) or potentially as two fields. (There's not a lot of point in doing the latter.) The seed would depend on the database - it could be an Oracle sequence, for example, or just a row in a separate table.Sy
Depending on data insertion scenarios, it seems it may lead to gaps in id-s and lots of wasted id-s? Is there a version of hi-lo algorithm tacking it?Kristopher
@Hohhi: If you have a large range of IDs (e.g. 64 or 128-bit) gaps really don't matter. If you're limited to a lower range, you allocate fewer per "high" part - e.g. only give them out in chunks of 256. You'll still "waste" some IDs, but in most cases it's really not a problem.Sy
@JonSkeet If I didn't misunderstand your mean, the Id'value is generated by hilo in the client without being in the DB server? Is there any good reason to do that ? thanks.Dave
@Joe.wang: You talk to the database to reserve a "chunk" of IDs: it gives you a "hi" value which it won't return to anyone else, and then you populate the "lo" values locally. It means that you just need 1 database call for in order to get a large chunk of IDs, instead of one per entity.Sy
@JonSkeet in the example you mentioned what are the primary keys to be written to database table? are they 35-0, 35-1, or smth? Also how can I define high and low numbers, what are the relationship of these numbers to hibernate_sequence table?Caco
@Nurlan: You'd either have some way of composing the individual parts into a single number (e.g. by multiplying the high part by 100,000 or something and then adding the low part) or you'd have a composite primary key. I'm not sure what you mean by "define high and low numbers" - and it's been too long since I've worked with Hibernate to give you the details of that aspect. Sorry!Sy
@JonSkeet Can this algorithm be used with other (Micro) ORMs such as Dapper? Is there any additional magic NHibernate does apart from the tables that we create, which will have to be incorporated in Dapper?Microbarograph
@AjayJadhav: I don't know anything about Dapper, I'm afraid.Sy
S
175

In addition to Jon's answer:

It is used to be able to work disconnected. A client can then ask the server for a hi number and create objects increasing the lo number itself. It does not need to contact the server until the lo range is used up.

Secretarygeneral answered 11/11, 2008 at 22:8 Comment(1)
I prefer this for brevity.Spathic
T
43

The hi/lo algorithm splits the sequences domain into hi groups. A hi value is assigned synchronously. Every hi group is given a maximum number of lo entries, that can be assigned off-line without worrying about concurrent duplicate entries.

  1. The hi token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values

  2. Once a hi token is retrieved we only need the incrementSize (the number of lo entries)

  3. The identifiers range is given by the following formula:

     [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)
    

    and the “lo” value will be in the range:

     [0, incrementSize)
    

    being applied from the start value of:

     [(hi -1) * incrementSize) + 1)
    
  4. When all lo values are used, a new hi value is fetched and the cycle continues

And this visual presentation is easy to follow as well:

enter image description here

While hi/lo optimizer is fine for optimizing identifier generation, it doesn't play well with other systems inserting rows into our database, without knowing anything about our identifier strategy.

Hibernate offers the pooled-lo optimizer, which offers the advantages of the hi/lo generator strategy while also providing interoperability with other 3rd-party clients that are not aware of this sequence allocation strategy.

Being both efficient and interoperable with other systems, the pooled-lo optimizer is a much better candidate than the legacy hi/lo identifier strategy.

Ticino answered 23/6, 2014 at 14:27 Comment(4)
I really don't understand you sometimes hahaha so: While hi/lo optimizer is fine for optimizing identifier generation (Ok good), it doesn't play well with other systems(what do you mean by other systems ?, which are the first ones ?) inserting rows into our database(Doesn't identifier generation used to insert rows too ?) , without knowing anything about our identifier strategy.Diviner
Other systems, like a DBA trying to run an INSERT statement. If she reads the current sequence data, do you think it's easy to figure out the next identifier value knowing we use hilo in this particular DB table?Ticino
My apologies if the comment isn't suitable for your answer, but I was wondering what optimizer is used by default? Or does it depend on DB (I'm using PostgreSQL)? Because I cannot figure out relation between current sequence value and generated IDs. I'm using @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "name") @SequenceGenerator(name="name", sequenceName = "name_seq", allocationSize=100) for my IDs.Marlenmarlena
@VladMihalcea, I believe you have a typo in bullet three, first snippet at , (hi * incrementSize) + 1)... it should be , hi * incrementSize), right?Barcellona
S
27

Lo is a cached allocator that splits the keyspace into large chunks, typically based on some machine word size, rather than the meaningfully-sized ranges (eg obtaining 200 keys at a time) which a human might sensibly choose.

Hi-Lo usage tends to waste large numbers of keys on server restart, and generate large human-unfriendly key values.

Better than the Hi-Lo allocator, is the "Linear Chunk" allocator. This uses a similar table-based principle but allocates small, conveniently-sized chunks & generates nice human-friendly values.

create table KEY_ALLOC (
    SEQ varchar(32) not null,
    NEXT bigint not null,
    primary key (SEQ)
);

To allocate the next, say, 200 keys (which are then held as a range in the server & used as needed):

select NEXT from KEY_ALLOC where SEQ=?;
update KEY_ALLOC set NEXT=(old value+200) where SEQ=? and NEXT=(old value);

Providing you can commit this transaction (use retries to handle contention), you have allocated 200 keys & can dispense them as needed.

With a chunk-size of just 20, this scheme is 10x faster than allocating from an Oracle sequence, and is 100% portable amongst all databases. Allocation performance is equivalent to hi-lo.

Unlike Ambler's idea, it treats the keyspace as a contiguous linear numberline.

This avoids the impetus for composite keys (which were never really a good idea) and avoids wasting entire lo-words when the server restarts. It generates "friendly", human-scale key values.

Mr Ambler's idea, by comparison, allocates the high 16- or 32-bits, and generates large human-unfriendly key values as the hi-words increment.

Comparison of allocated keys:

Linear_Chunk       Hi_Lo
100                65536
101                65537
102                65538
.. server restart
120                131072
121                131073
122                131073
.. server restart
140                196608

Design-wise, his solution is fundamentally more complex on the number-line (composite keys, large hi_word products) than Linear_Chunk while achieving no comparative benefit.

The Hi-Lo design arose early in OO mapping and persistence. These days persistence frameworks such as Hibernate offer simpler and better allocators as their default.

Saunders answered 26/8, 2013 at 9:29 Comment(7)
Who is mister Ambler?Windward
Scott Ambler promotes a so-called "hi-lo" allocation strategy using 16- or 32-bit words. Here's his page: ambysoft.com/scottAmbler.htmlSaunders
+1 for an interesting answer. I agree that the vast majority of applications gain no advantage from Hi-Lo over the simpler approach; however I think Hi-Lo is better suited to the special case of multiple allocators in highly concurrent applications.Arabel
Thanks @richj! My point is that you can use multiple allocators or large block sizes with "linear block allocation", but that -- unlike Hi/Lo -- it maintains a linear correspondence of allocator NEXT_VAL to keys in the table, and is tuneable. Unlike HiLo, no multiplication is needed -- it's just not necessary! The multiplier & storage of NEXT_HI makes HiLo more complex & breaks tuneability, since changing the blocksize will arbitrarily change the next key to be issued.. See: literatejava.com/hibernate/…Saunders
I'm interested in multiple independent allocators. With Hi-Lo it's obvious that the high value can be partitioned into allocator ID/block ID. It wasn't immediately obvious (to me) that the same approach can be applied to Linear Chunk, but it is basically the same problem of dividing the total range between allocators. I've got it now. Thanks.Arabel
What is the purpose of the SEQ column of table KEY_ALLOC? How is it used?Complicity
Oh, after thinking about it, I think the SEQ column maps to a table name. For example, there is an allocator the Customers table, one for the Orders table, and so forth. Forgive me, I'm slow, sometimes.Complicity
S
2

I found the Hi/Lo algorithm is perfect for multiple databases with replication scenarios based in my experience. Imagine this. you have a server in New York (alias 01) and another server in Los Angeles (alias 02) then you have a PERSON table... so in New York when a person is create... you always use 01 as the HI value and the LO value is the next secuential. por example.

  • 010000010 Jason
  • 010000011 David
  • 010000012 Theo

in Los Angeles you always use the HI 02. for example:

  • 020000045 Rupert
  • 020000046 Oswald
  • 020000047 Mario

So, when you use the database replication (no matter what brand) all the primary keys and data combine easily and naturally without to worry about duplicate primary keys, collissions, etc.

This is the best way to go in this scenario.

Salmagundi answered 23/9, 2016 at 17:8 Comment(1)
It doens't work in Hibernate. HiLo algrotirm gets a new value of sequence in each transaction, so HI-counter increments accordinally. But in your example, HI-counter is always constant for one DB.Lexine

© 2022 - 2024 — McMap. All rights reserved.