How do I request paginated BigQuery query results using pageTokens with the Google Client lib for Java?
Asked Answered
N

2

9

I want to run BigQuery queries with thousands of rows of total results, but I only want to retrieve a page of 100 results at a time (using the maxResults and pageToken parameters).

The BigQuery API supports the use of pageToken parameters on collection.list methods. However, I am running asynchronous queries and retrieving the results using the getQueryResult method, and it doesn't seem to support the pageToken parameter. Is it possible to use pageTokens with getQueryResults?

Nafis answered 11/2, 2013 at 7:36 Comment(1)
How we handle it in javascript.I am trying to do it in javascript and does not get any thing in documentation.Any help will be highly appreciatedHege
N
14

Update: There's new documentation about how to page through list results here.

I am self-answering this question, because a developer asked me this privately and I want to share the answer on Stack Overflow.

The pageToken parameter is available to use when requesting paginated results from the Tabledata.list method. Result sets are paginated automatically when, for example, the result data is over 100k rows or 10 MB of results. You can also request result pagination by setting the maxResults parameter explicitly. Each page of results will return a pageToken parameter, which can then be used to retrieve the next page of results.

Every query results in a new BigQuery table. If you don't name the table explicitly, it only lasts for 24 hours. However, even unnamed "anonymous" tables have an identifier. In either case, after inserting a query job, retrieve the name of newly created table. Then use the tabledata.list method (and a combination of the maxResults/pageToken parameters) to request results in paginated form. Loop and continue to call tabledata.list using the previously retrieved pageToken until the pageTokens are no longer is returned (meaning that you have reached the last page.

Using the Google API Client library for Java, the code for inserting a query job, polling for query completion, and then retrieving page after page of query results might look something like this:

// Create a new BigQuery client authorized via OAuth 2.0 protocol
// See: https://developers.google.com/bigquery/docs/authorization#installed-applications
Bigquery bigquery = createAuthorizedClient();

// Start a Query Job
String querySql = "SELECT TOP(word, 500), COUNT(*) FROM publicdata:samples.shakespeare";
JobReference jobId = startQuery(bigquery, PROJECT_ID, querySql);

// Poll for Query Results, return result output
TableReference completedJob = checkQueryResults(bigquery, PROJECT_ID, jobId);

// Return and display the results of the Query Job
displayQueryResults(bigquery, completedJob);

/**
 * Inserts a Query Job for a particular query
 */
public static JobReference startQuery(Bigquery bigquery, String projectId,
                                      String querySql) throws IOException {
  System.out.format("\nInserting Query Job: %s\n", querySql);

  Job job = new Job();
  JobConfiguration config = new JobConfiguration();
  JobConfigurationQuery queryConfig = new JobConfigurationQuery();
  config.setQuery(queryConfig);

  job.setConfiguration(config);
  queryConfig.setQuery(querySql);

  Insert insert = bigquery.jobs().insert(projectId, job);
  insert.setProjectId(projectId);
  JobReference jobId = insert.execute().getJobReference();

  System.out.format("\nJob ID of Query Job is: %s\n", jobId.getJobId());

  return jobId;
}

/**
 * Polls the status of a BigQuery job, returns TableReference to results if "DONE"
 */
private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId)
    throws IOException, InterruptedException {
  // Variables to keep track of total query time
  long startTime = System.currentTimeMillis();
  long elapsedTime;

  while (true) {
    Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute();
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.format("Job status (%dms) %s: %s\n", elapsedTime,
        jobId.getJobId(), pollJob.getStatus().getState());
    if (pollJob.getStatus().getState().equals("DONE")) {
      return pollJob.getConfiguration().getQuery().getDestinationTable();
    }
    // Pause execution for one second before polling job status again, to
    // reduce unnecessary calls to the BigQUery API and lower overall
    // application bandwidth.
    Thread.sleep(1000);
  }
}

/**
 * Page through the result set
 */
private static void displayQueryResults(Bigquery bigquery,
                                        TableReference completedJob) throws IOException {

    long maxResults = 20;
    String pageToken = null;
    int page = 1;

  // Default to not looping
    boolean moreResults = false;

    do {
    TableDataList queryResult = bigquery.tabledata().list(
            completedJob.getProjectId(),
            completedJob.getDatasetId(),
            completedJob.getTableId())
                .setMaxResults(maxResults)
                .setPageToken(pageToken)
         .execute();
    List<TableRow> rows = queryResult.getRows();
    System.out.print("\nQuery Results, Page #" + page + ":\n------------\n");
    for (TableRow row : rows) {
      for (TableCell field : row.getF()) {
      System.out.printf("%-50s", field.getV());
       }
      System.out.println();
    }
    if (queryResult.getPageToken() != null) {
      pageToken = queryResult.getPageToken();
      moreResults = true;
      page++;
    } else {
      moreResults = false;
    }
  } while (moreResults);
}
Nafis answered 11/2, 2013 at 7:37 Comment(3)
Why does my job not give me a destination table?Abelmosk
it appear that API changed again and Google didn't update it since 0.3 while the new version is about 1.2 how to i get the next page in the new version?Withers
why does page token does not change through out the job ?Tetter
P
0

For anyone who stumbles across this problem in 2024 and is using the Java Client, there is a much shorter way to manage exporting results to an anonymous temporary table (i.e., the Destination Table in QueryJobConfiguration.setDestinationTable) but it also requires querying with a Job instead of just using one of the BigQuery.query methods

this is groovy code - you will need to have a BigQuery service object named bigquery somewhere in the compilation unit...

   /**
     * Uses a Job to query so we can access the anonymous Destination Table that is where the results
     * are cached.  This enables walking through paginated results in the service layer
     * @param query sql query
     * @param limit max results per page
     * @return the result of the query, using the anonymous Destination Table as the source
     */
    protected TableResult _queryPaginate(String query, Integer limit){
        // Use standard SQL syntax for queries.
        // See: https://cloud.google.com/bigquery/sql-reference/
        QueryJobConfiguration queryConfig =
                QueryJobConfiguration.newBuilder(query)
                        .setUseLegacySql(false)
                        .build()
        // Create a job ID so that we can safely retry.
        JobId jobId = JobId.of(UUID.randomUUID().toString());
        Job queryJob = bigQuery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build())

        // Wait for the query to complete.
        queryJob = queryJob.waitFor()
        log.info "queryJob complete:${queryJob.getJobId()}"
        // Check for errors
        if (queryJob == null) {
            throw new RuntimeException("Job no longer exists")
        } else if (queryJob.getStatus().getError() != null) {
            // You can also look at queryJob.getStatus().getExecutionErrors() for all
            // errors, not just the latest one.
            throw new RuntimeException(queryJob.getStatus().getError().toString())
        }
        def queryJobConfig = queryJob.getConfiguration() as QueryJobConfiguration
        def resultsTable = queryJobConfig.getDestinationTable()

        log.info "list from resultsTable:$resultsTable"
        //first get the results so we can get the schema
        def results = queryJob.getQueryResults()
        //then list the results from the anonymous table for pagination
        bigQuery.listTableData(resultsTable,results.getSchema(), BigQuery.TableDataListOption.pageSize(limit))
    }

Patinous answered 26/3 at 1:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.