Fastest way to search for a row in a large Google Sheet using/in Google Apps Script
N

3

6

GAS is quite powerful and you could write a full fledged web-app using a Google Sheet as the DB back-end. There are many reasons not to do this but I figure in some cases it is okay.

I think the biggest issue will be performance issues when looking for rows based on some criteria in a sheet with a lot of rows. I know there are many ways to "query" a sheet but I can't find reliable information on which is the fastest.

One of the complexities is that many people can edit a sheet which means there are a variable number of situations you'd have to account for. For the sake of simplicity, I want to assume the sheet:

  • Is locked down so only one person can see it
  • The first column has the row number (=row())

The most basic query is finding a row where a specific column equals some value.

Which method would be the fastest?

Noah answered 24/6, 2019 at 13:31 Comment(2)
I'm voting to close this question as off-topic because this is not a question.Cancellation
@Cancellation Why is it off-topic? It's a programming question about GAS. And it is a question -- which method is the fastest. Granted I already have the answer but I see merit in others being able to see the answer. Can you please help me understand your concern so I know for next time?Noah
N
14

I have a sheet with ~19k rows and ~38 columns, filled with all sorts of unsorted real-world data. That is almost 700k rows so I figured it would be a good sheet to time a few methods and see which is the fastest.

  • method 1: get sheet as a 2D array then go through each row
  • method 2: get sheet as a 2D array, sort it, then using a binary search algorithm to find the row
  • method 3: make a UrlFetch call to Google visualization query and don't provide last row
  • method 4: make a UrlFetch call to Google visualization query and provide last row

Here are the my query functions.

function method1(spreadsheetID, sheetName, columnIndex, query)
{
    // get the sheet values excluding header, 
    var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);

    // loop through each row
    for(var i = 0, numRows = rowValues.length; i < numRows; ++i)
    {
        // return it if found
        if(rowValues[i][columnIndex] == query) return rowValues[i]
    }

    return false;
}

function method2(spreadsheetID, sheetName, columnIndex, query)
{
    // get the sheet values excluding header
    var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);

    // sort it
    rowValues.sort(function(a, b){
        if(a[columnIndex] < b[columnIndex]) return -1;
        if(a[columnIndex] > b[columnIndex]) return 1;
        return 0;
    });

    // search using binary search
    var foundRow = matrixBinarySearch(rowValues, columnIndex, query, 0, rowValues.length - 1);

    // return if found
    if(foundRow != -1)
    {
        return rowValues[foundRow];
    }

    return false;
}

function method3(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
    // SQL like query
    myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";

    // the query URL
    // don't provide last row in range selection
    var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + ":" + queryColumnLetterEnd + '&tq=' + encodeURIComponent(myQuery);

    // fetch the data
    var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();

    // remove some crap from the return string
    return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}

function method4(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
    // find the last row in the sheet
    var lastRow = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getLastRow();

    // SQL like query
    myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";

    // the query URL
    var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + "1:" + queryColumnLetterEnd + lastRow + '&tq=' + encodeURIComponent(myQuery);

    // fetch the data
    var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();

    // remove some crap from the return string
    return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}

My binary search algorithm:

function matrixBinarySearch(matrix, columnIndex, query, firstIndex, lastIndex)
{
    // find the value using binary search
    // https://www.w3resource.com/javascript-exercises/javascript-array-exercise-18.php

    // first make sure the query string is valid
    // if it is less than the smallest value
    // or larger than the largest value
    // it is not valid
    if(query < matrix[firstIndex][columnIndex] || query > matrix[lastIndex][columnIndex]) return -1;

    // if its the first row
    if(query == matrix[firstIndex][columnIndex]) return firstIndex;

    // if its the last row
    if(query == matrix[lastIndex][columnIndex]) return lastIndex;

    // now start doing binary search
    var middleIndex = Math.floor((lastIndex + firstIndex)/2);

    while(matrix[middleIndex][columnIndex] != query && firstIndex < lastIndex)
    {
        if(query < matrix[middleIndex][columnIndex])
        {
            lastIndex = middleIndex - 1;
        }
        else if(query > matrix[middleIndex][columnIndex])
        {
            firstIndex = middleIndex + 1;
        }

        middleIndex = Math.floor((lastIndex + firstIndex)/2);
    }

    return matrix[middleIndex][columnIndex] == query ? middleIndex : -1;
}

This is the function I used to test them all:

// each time this function is called it will try one method
// the first time it is called it will try method1
// then method2, then method3, then method4
// after it does method4 it will start back at method1
// we will use script properties to save which method is next
// we also want to use the same query string for each batch so we'll save that in script properties too
function testIt()
{
    // get the sheet where we're staving run times
    var runTimesSheet = SpreadsheetApp.openById("...").getSheetByName("times");

    // we want to see true speed tests and don't want server side caching so we a copy of our data sheet
    // make a copy of our data sheet and get its ID
    var tempSheetID = SpreadsheetApp.openById("...").copy("temp sheet").getId();

    // get script properties
    var scriptProperties = PropertiesService.getScriptProperties();

    // the counter
    var searchCounter = Number(scriptProperties.getProperty("searchCounter"));

    // index of search list we want to query for
    var searchListIndex = Number(scriptProperties.getProperty("searchListIndex"));

    // if we're at 0 then we need to get the index of the query string
    if(searchCounter == 0)
    {
        searchListIndex = Math.floor(Math.random() * searchList.length);
        scriptProperties.setProperty("searchListIndex", searchListIndex);
    }

    // query string
    var query = searchList[searchListIndex];

    // save relevant data
    var timerRow = ["method" + (searchCounter + 1), searchListIndex, query, 0, "", "", "", ""];

    // run the appropriate method
    switch(searchCounter)
    {
        case 0:
            // start time
            var start = (new Date()).getTime();

            // run the query
            var ret = method1(tempSheetID, "Extract", 1, query);

            // end time
            timerRow[3] = ((new Date()).getTime() - start) / 1000;

            // if we found the row save its values in the timer output so we can confirm it was found
            if(ret)
            {
                timerRow[4] = ret[0];
                timerRow[5] = ret[1];
                timerRow[6] = ret[2]; 
                timerRow[7] = ret[3];
            }
            break;
        case 1:
            var start = (new Date()).getTime();
            var ret = method2(tempSheetID, "Extract", 1, query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret)
            {
                timerRow[4] = ret[0];
                timerRow[5] = ret[1];
                timerRow[6] = ret[2]; 
                timerRow[7] = ret[3];
            }
            break;
        case 2:
            var start = (new Date()).getTime();
            var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret.table.rows.length)
            {
                timerRow[4] = ret.table.rows[0].c[0].v;
                timerRow[5] = ret.table.rows[0].c[1].v;
                timerRow[6] = ret.table.rows[0].c[2].v;
                timerRow[7] = ret.table.rows[0].c[3].v;
            }
            break;
        case 3:
            var start = (new Date()).getTime();
            var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret.table.rows.length)
            {
                timerRow[4] = ret.table.rows[0].c[0].v;
                timerRow[5] = ret.table.rows[0].c[1].v;
                timerRow[6] = ret.table.rows[0].c[2].v;
                timerRow[7] = ret.table.rows[0].c[3].v;
            }
            break;
    }

    // delete the temp file
    DriveApp.getFileById(tempSheetID).setTrashed(true);

    // save run times
    runTimesSheet.appendRow(timerRow);

    // start back at 0 if we're the end
    if(++searchCounter == 4) searchCounter = 0;

    // save the search counter
    scriptProperties.setProperty("searchCounter", searchCounter);
}

I have a global variable searchList that is an array of various query strings -- some are in the sheet, some are not.

I ran testit on a trigger to run every minute. After 152 iterations I had 38 batches. Looking at the result, this is what I see for each method:

| Method  | Minimum Seconds | Maximum Seconds | Average Seconds |
|---------|-----------------|-----------------|-----------------|
| method1 |            8.24 |           36.94 |           11.86 |
| method2 |            9.93 |           23.38 |           14.09 |
| method3 |            1.92 |            5.48 |            3.06 |
| method4 |            2.20 |           11.14 |            3.36 |

So it appears that, at least for my data-set, is using Google visualization query is the fastest.

Noah answered 24/6, 2019 at 13:34 Comment(9)
if I get time I want to do a few other test-cases like getting all rows matching a single query and getting all rows matching multiple queries.Noah
It would be interesting to know that If the data is already sorted, by how much would method2 be faster.Stob
@Stob That is next test on list.Noah
Also in my environment, the search using the Query Language was the fastest of all. When the values are searched from Spreadsheet, the cost for getting all values from Spreadsheet is the large bottleneck. So the search using the Query Language and TheMaster's proposal are suitable for this situation. And also, if you want to be simple script, I think that TextFinder added recently can be used. stackoverflow.com/a/56663884Trimetallic
@Trimetallic I will add a TextFinder test too.Noah
Hi @IMTheNachoMan, did you get to test TextFinder?Farley
@grreeenn No. Never got around to it.Noah
For what it's worth, I found TextFinder was much faster than loading the whole sheet with .getValues() @grreeennDeclension
In my testing with small to medium sized sheets, the bottleneck was not getting the sheet values, but rendering as an html page. This is quickly solved by limiting the number of rows returned to the user's web page.Declension
P
1

function testIt() In case 3: you enter method3 incorrectly, you must enter method4

Panthea answered 28/6 at 0:0 Comment(1)
Good catch. Thank you!Noah
D
0

Consider using the CacheService. You can cache the results to avoid both having to fetch values from the spreadsheet and re-render the html/css.

For example, you could use something like this:

const cachedHtmlString = CacheService.getDocumentCache().get("yourKey")
if (cachedHtmlString) {
  console.log("fetching from cache")
  return HtmlService.createHtmlOutput(cachedHtmlString)
} else {
  // do your normal stuff here
  // and save the results to the cache before returning, eg
  CacheService.getDocumentCache().put("yourKey", newHtmlString)
}

Remember to use Cache.remove("yourKey") if you update your sheet's data so you aren't serving old info to users.

IMTheNachoMan's answer is very helpful, especially if the Google Visualization API (GViz) suits your needs. Unfortunately, GViz requires you to parse the resulting json/csv/html and re-render it. In my testing, rendering the data into html/css is also a significant bottleneck, not just fetching the data from the sheet. GViz seems to be the best way to fetch the data from the sheet, but caching may help improve performance even more.

Declension answered 27/8 at 3:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.