Offset/limit to page/size conversion
Asked Answered
D

4

12

This should be an interresting challenge. I'm looking for an algorithm that doesn't exist yet (to the best of my knowledge)

  • We have a database accessing function that can read pages of records at a time, using a page-number and page-size as arguments. Lets call this function getFromDatabase(int page, int size).
  • We want to offer a REST API which should return records based on an offset and limit. Lets wrap this in a function getRecords(int offset, int limit).

Somehow we must use the given offset and limit to retrieve the matching database records which can only be access by page and size. Obviously, offset/limit won't always map to a single page/size. The challenge is finding an algorihm that makes the "ideal" number of getFromDatabase calls to retrieve all records. This algorithm should take several factors into account:

  • Each call to getFromDatabase has a certain overhead cost; keep calls to a minimum.
  • Each record retrieved adds additonal overhead; retrieve as few records as possible (it's okay to retrieve "waste" if it decreases total overhead).
  • The algorithm itself has overhead costs as well; obviously they should not outweight any benefits.

I've come up with the following algorithm: http://jsfiddle.net/mwvdlee/A7J9C/ (JS code, but the algorithm is language-agnostic). Essentially it is the following pseudocode:

do {
    do {
        try to convert (offset,limit) to (page,size)
        if too much waste
            lower limit by some amount
        else
            call `getDatabaseRecords()`
            filter out waste records
            increase offset to first record not yet retrieved
            lower limit to last records not yet retrieved              
    } until some records were retrieved
} until all records are retrieved from database

The key of this algorithm is in determining too much waste and some amount. But this algorithm isn't optimal, nor is it guarenteed to be complete (it may well be, I just can't proof it).

Are there any better (known?) algorithms, or improvements I could make? Does anybody have good ideas of how to tackle this problem?

Decreasing answered 2/7, 2014 at 14:3 Comment(11)
Probably, the by far cheapest option is to issue just one call to the external service because returning rows by offset usually requires scanning through all previous rows. That's expensive. Transferring a single row is very cheap compared to that.Paleface
@usr; perhaps I'm misunderstanding; why would transferring a single row be cheaper if the caller has no identifying information for that row? Wouldn't it effectively be like retrieving with limit 1?Decreasing
What I meant was transferring 11 rows is probably almost as cheap as transferring 10 rows as long as it happens as part of one query. it is not 10% more expensive. More like .1% more.Paleface
I suppose that a consecutive number of records (first at offset, last at offset+limit-1) is allocated in DB storage, but usually this need not be a chunk of some DB storage memory starting at some address a and ending at offset*recordsize. So how can you hope to retrieve this with a single page (in what kind of "unit" is this value given anyway) and arbitrary size (in bytes)?Smooth
@laune; I don't hope to retrieve arbitrary record chunks in a single page; there are plenty of cases where a single page would necessarily be incredibly inefficient. The point is to find an algorithm that determines the least amount of pages and pagesizes needed to most efficiently get the consecutive records.Decreasing
@Decreasing you haven't yet responded to my proposal. It seems like a simple solution. It is important to know why this is not a viable approach (and apparently it is not because you have not acknowledged it).Paleface
@usr; I don't quite understand what you're proposing. It doesn't seem to touch the main issue of converting offset+limit to page+size.Decreasing
@Decreasing if someone requests { offset = 25, limit = 10 } from you, can't you just call getFromDatabase(page: 2, size: 20) and get all data in a single call (and discard some rows)? This sounds not too hard to solve so I might have missed some requirement. It is always possible to construct a single call that wastes at most 50%.Paleface
@usr; How about when somebody requests { offset = 73, limit = 117 }. You could call getFromDatabase(page: 0, size: 190), but then you'd get 73 records you won't need. If getFromDatabase does all kinds of complex stuff with those 73 records, that may be a lot of wasted time. It would have been faster to get page 1, size 73 and page 3, size 48, which have just 4 waste records in total. The problem is how to quickly determine which pages/sizes combinations have the least total overhead costs.Decreasing
Each call to retrieve records also has overhead. How big is the per-row overhead? I previously conjectured that it is near zero, as would be typical for a service backed by a simple database query. You must know the ratio between the two to optimize the retrieval algorithm.Paleface
@usr, indeed the "best" solution would take into consideration overhead per-call, per-row and for the algorithm (all mentioned in the question already). A prefered solution could handle these (relative) costs. I agree in most cases per-row overhead would be neglicable, however I also know cases where per-row overhead is significant. Currently testing a method that takes a "waste percentage" parameter; this method produced the 2-page result for {O:73,L:117} earlier, based on maximum 10% waste. for a 2% waste it returns 3 pages. However, the method uses brute force, so it's slow.Decreasing
B
7

As pointed out by @usr, in most variants of this problem (whether querying a database, API, or some other entity) it is preferable to reduce the number of calls as much as possible, since returning a few extra rows is almost always cheaper than issuing a separate call. The following PageSizeConversion algorithm will always find a single call that returns the fewest records possible (which is exactly how it performs the search). There may be some extra records returned at the beginning (headWaste) or end (tailWaste) of the data set, required to fit the data set within a single page. The algorithm is implemented in Javascript here, but it is easy to port to any language.

function PageSizeConversion(offset, limit) {
  var window, leftShift;
  for (window = limit; window <= offset + limit; window++) {
    for (leftShift = 0; leftShift <= window - limit; leftShift++) {
      if ((offset - leftShift) % window == 0) {
        this.pageSize = window;
        this.page = (offset - leftShift) / this.pageSize;

        this.headWaste = leftShift;
        this.tailWaste = ((this.page + 1) * this.pageSize) - (offset + limit);
        return;
      }
    }
  }
}

var testData = [
  {"offset": 0,"limit": 10,"expectedPage": 0,"expectedSize": 10,"expectedHeadWaste": 0,"expectedTailWaste": 0},
  {"offset": 2,"limit": 1,"expectedPage": 2,"expectedSize": 1,"expectedHeadWaste": 0,"expectedTailWaste": 0},
  {"offset": 2,"limit": 2,"expectedPage": 1,"expectedSize": 2,"expectedHeadWaste": 0,"expectedTailWaste": 0},
  {"offset": 5,"limit": 3,"expectedPage": 1,"expectedSize": 4,"expectedHeadWaste": 1,"expectedTailWaste": 0},
  {"offset": 3,"limit": 5,"expectedPage": 0,"expectedSize": 8,"expectedHeadWaste": 3,"expectedTailWaste": 0},
  {"offset": 7,"limit": 3,"expectedPage": 1,"expectedSize": 5,"expectedHeadWaste": 2,"expectedTailWaste": 0},
  {"offset": 1030,"limit": 135,"expectedPage": 7,"expectedSize": 146,"expectedHeadWaste": 8,"expectedTailWaste": 3},
];

describe("PageSizeConversion Tests", function() {
  testData.forEach(function(testItem) {
    it("should return correct conversion for offset " + testItem.offset + " limit " + testItem.limit, function() {
      conversion = new PageSizeConversion(testItem.offset, testItem.limit);
      expect(conversion.page).toEqual(testItem.expectedPage);
      expect(conversion.pageSize).toEqual(testItem.expectedSize);
      expect(conversion.headWaste).toEqual(testItem.expectedHeadWaste);
      expect(conversion.tailWaste).toEqual(testItem.expectedTailWaste);
    });
  });
});

// load jasmine htmlReporter
(function() {
  var env = jasmine.getEnv();
  env.addReporter(new jasmine.HtmlReporter());
  env.execute();
}());
<script src="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine.js"></script>
<script src="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine-html.js"></script>
<link href="https://cdn.jsdelivr.net/jasmine/1.3.1/jasmine.css" rel="stylesheet" />
<title>Jasmine Spec Runner</title>

This is perhaps not exactly what @Martijn is looking for, since it occasionally produces a result with a lot of waste. But in most cases it seems like a good solution to the general problem.

Bufford answered 12/1, 2018 at 20:34 Comment(0)
P
2

Principles of Offset/Limit To Page/Page-Size Pagination (Precise, No Waste)

Principles of Offset/Limit To Page/Page-Size Pagination

If page-size == limit or (offset % limit) == 0, page == (whole number) else page == (float).

To run type 'node convertOffsetToPage.js ' at the command line

function convertOffsetToPage(offset, limit) {
    const precision = 1000000;
    const pageSize = limit;
    let page = (offset + limit) / limit;
    page = Math.round(page * precision) / precision;
    return { page, pageSize };
}

function dbServiceSimulation(page, pageSize, items) {
    const start = Math.round((page - 1) * pageSize);
    const end = Math.round(page * pageSize);
    return items.slice(start, end);
}

function getDataItems(itemCount) {
    return Array.from(Array(itemCount), (_, x) => x);
}

const dataItems = getDataItems(1000000);

console.log('\ndata items: ', dataItems);

let offset = parseInt(process.argv[2], 10) || 0;
let limit = parseInt(process.argv[3], 10) || 1;

console.log('\ninput offset: ', offset);
console.log('\ninput limit: ', limit);

const { page, pageSize } = convertOffsetToPage(offset, limit);

console.log('\npage = ', page);

console.log('\npageSize = ', pageSize);

const result = dbServiceSimulation(page, pageSize, dataItems);

console.log('\nresult after core Service call');

console.log('\nresult: ', result)

console.log('\n');

enter image description here

Purkey answered 4/5, 2020 at 20:9 Comment(0)
B
0

I also ran into this problem. My solution is (in java) in short below. First the trivial cases are closed out and then comes the tricky part :). It tries to find the optimal page size, page number and calculates the new offset from the beginning of the page. The original limit preserved:

public static PageSizeOffsetLimit toPageSize(int offset, int limit) {
    if (offset < limit) {
        return new PageSizeOffsetLimit(0, offset + limit, offset, limit);
    }
    if (offset == limit) {
        return new PageSizeOffsetLimit(1, limit, 0, limit);
    }

    for (int size = limit; size < 2 * limit; size++) {
        int newOffset = offset % size;
        if ((size - limit) >= newOffset) {
            return new PageSizeOffsetLimit(offset / size, size, newOffset, limit);
        }
    }
    throw new RuntimeException(String.format(
        "Cannot determinate page and size from offset and limit (offset: %s, limit: %s)",
        offset,
        limit));
}

Have fun :)

Belanger answered 6/5, 2021 at 18:31 Comment(0)
A
0

This seems to be a more efficient version of previously proposed algorithm (the idea is to use tailWaste value for each window as exit condition rather than looping over leftShift):

     static convert(offset: number, limit: number) {
       if (offset < limit) {
          return {
           page: 0,
           pageSize: offset + limit,
           headWaste: offset,
           tailWaste: 0
           }
      }

    let window, page = 0, headWaste = 0, tailWaste = 0
    for (window = limit; window <= offset + limit; window++) {
      page = Math.floor(offset / window)
      headWaste = offset % window
      tailWaste = ((page + 1) * window) - (offset + limit);
      if (tailWaste >= 0) {
        return {
          page,
          pageSize: window,
          headWaste,
          tailWaste
        }
      }

    }

  }
Alert answered 17/6 at 9:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.