Jira's Lexorank algorithm for new stories
Asked Answered
S

1

29

I am looking to create a large list of items that allows for easy insertion of new items and for easily changing the position of items within that list. When updating the position of an item, I want to change as few fields as possible regarding the order of items.

After some research, I found that Jira's Lexorank algorithm fulfills all of these needs. Each story in Jira has a 'rank-field' containing a string which is built up of 3 parts: <bucket>|<rank>:<sub-rank>. (I don't know whether these parts have actual names, this is what I will call them for ease of reference)

Examples of valid rank-fields:

  • 0|vmis7l:hl4
  • 0|i000w8:
  • 0|003fhy:zzzzzzzzzzzw68bj

When dragging a card above 0|vmis7l:hl4, the new card will receive rank 0|vmis7l:hl2, which means that only the rank-field for this new card needs to be updated while the entire list can always be sorted on this rank-field. This is rather clever, and I can't imagine that Lexorank is the only algorithm to use this.

  1. Is there a name for this method of sorting used in the sub-rank?

My question is related to the creation of new cards in Jira. Each new card starts with an empty sub-rank, and the rank is always chosen such that the new card is located at the bottom of the list. I've created a bunch of new stories just to see how the rank would change, and it seems that the rank is always incremented by 8 (in base-36).

  1. Does anyone know more specifically how the rank for new cards is generated? Why is it incremented by 8?

I can only imagine that after some time (270 million cards) there are no more ranks to generate, and the system needs to recalculate the rank-field of all cards to make room for additional ranks.

  1. Are there other triggers that require recalculation of all rank-fields?
  2. I suppose the bucket plays a role in this recalculation. I would like to know how?
Schaerbeek answered 21/11, 2016 at 11:20 Comment(4)
Great question and good research. I too am curious about this.Chorizo
As a licensed JIRA user, you have access to the source code for JIRA Software from your my.atlassian.com account. You can find the various LexoRank algorithms buried in the JIRA Software (GreenHopper) JAR, which (at least as of 7.1.0) can be found in dependencySources/jira-greenhopper-plugin-X.Y.Z-sources.jar. After extracting that JAR, a find . -name '*LexoRank*' will show you where to find the inner workings of the algorithm.Monteiro
Useful video for those interested in lexorank: youtube.com/watch?v=OjQv9xMoFbgKendallkendell
You may use the library sort (gapped sort) idea too. Keep blocks with spaces. This idea seems very similar to this sort's intermediate step. en.wikipedia.org/wiki/Library_sortCatchpenny
M
10
  1. We are talking about a special kind of indexing here. This is not sorting; it is just preparing items to end up in a certain order in case someone happens to sort them (by whatever sorting algorithm). I know that variants of this kind of indexing have been used in libraries for decades, maybe centuries, to ensure that books belonging together but lacking a common title end up next to each other in the shelves, but I have never heard of a name for it.

  2. The 8 is probably chosen wisely as a compromise, maybe even by analyzing typical use cases. Consider this: If you choose a small increment, e. g. 1, then all tickets will have ranks like [a, b, c, …]. This will be great if you create a lot of tickets (up to 26) in the correct order because then your rank fields keep small (one letter). But as soon as you move a ticket between two other tickets, you will have to add a letter: [a, b] plus a new ticket between them: [a, an, b]. If you expect to have this a lot, you better leave gaps between the ranks: [a, i, q, …], then an additional ticket can get a single letter as well: [a, e, i, q, …]. But of course if you now create lots of tickets in the correct order right in the beginning, you quickly run out of letters: [a, i, q, y, z, za, zi, zq, …]. The 8 probably is a good value which allows for enough gaps between the tickets without increasing the need for many letters too soon. Keep in mind that other scenarios (maybe not Jira tickets which are created manually) might make other values more reasonable.

  3. You are right, the rank fields get recalculated now and then, Lexorank calls this "balancing". Basically, balancing takes place in one of three occasions: ① The ranks are exhausted (largest value reached), ② the ranks are due to user-reranking of tickets too close together ([a, b, i] and something is supposed to go in between a and b), and ③ a balancing is triggered manually in the management page. (Actually, according to the presentation, Lexorank allows for up to three letter ranks, so "too close together" can be something like aaa and aab but the idea is the same.)

  4. The <bucket> part of the rank is increased during balancing, so a messy [0|a, 0|an, 0|b] can become a nice and clean [1|a, 1|i, 1|q] again. The brownbag presentation about Lexorank (as linked by @dandoen in the comments) mentions a round-robin use of <buckets>, so instead of a constant increment (0→1→2→3→…) a 2 is increased modulo 3, so it will turn back to 0 after the 2 (0→1→2→0→…). When comparing the ranks, the sorting algorithm can consider a 0 "greater" than a 2 (it will not be purely lexicographical then, admitted). If now the balancing algorithm works backwards (reorder the last ticket first), this will keep the sorting order intact all the time. (This is just a side aspect, that's why I keep the explanation small, but if this is interesting, ask, and I will elaborate on this.)

Sidenote: Lexorank also keeps track of minimum and maximum values of the ranks. For the functioning of the algorithm itself, this is not necessary.

Malca answered 26/3, 2019 at 11:0 Comment(2)
When comparing the ranks, the sorting algorithm can consider a 0 "greater" than a 2 (it will not be purely lexicographical then, admitted) A little confusing as though: say, for a rank system with length 3 we have a two adjacent rank 0|aaa and 0|aab now, we might insert a new rank between them bucket 1 as 1|aaa this will run out too eventually. This way we could end up with two element having same rank say 0|aaa with ordering 0|aaa < 1|aaa < 2|aaa < 0|aaa but how exactly our database know the relative ordering of first 0|aaa and last 0|aaa ?Candicandia
The buckets numbers only get increased (modulo 3) when the balancing is performed. So if you just insert a new value, you stick to the currently used bucket number. Only during the balancing process (which might take some time and during which the DB is in read-only mode) you will be able to find two different bucket numbers. When the balancing process is complete (and the DB set to read/write again) you will find the same bucket number in all entries.Malca

© 2022 - 2024 — McMap. All rights reserved.