How can I update more than 500 docs in Firestore using Batch?
Asked Answered
D

10

33

I'm trying to update a field timestamp with the Firestore admin timestamp in a collection with more than 500 docs.

const batch = db.batch();
const serverTimestamp = admin.firestore.FieldValue.serverTimestamp();

db
  .collection('My Collection')
  .get()
  .then((docs) => {
    serverTimestamp,
  }, {
    merge: true,
  })
  .then(() => res.send('All docs updated'))
  .catch(console.error);

This throws an error

{ Error: 3 INVALID_ARGUMENT: cannot write more than 500 entities in a single call
    at Object.exports.createStatusError (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\common.js:87:15)
    at Object.onReceiveStatus (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:1188:28)
    at InterceptingListener._callNext (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:564:42)
    at InterceptingListener.onReceiveStatus (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:614:8)
    at callback (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:841:24)
  code: 3,
  metadata: Metadata { _internal_repr: {} },
  details: 'cannot write more than 500 entities in a single call' }

Is there a way that I can write a recursive method which creates a batch object updating a batch of 500 docs one by one until all the docs are updated.

From the docs I know that delete operation is possible with the recursive approach as mentioned here:

https://firebase.google.com/docs/firestore/manage-data/delete-data#collections

But, for updating, I'm not sure how to end the execution since the docs are not being deleted.

Deciare answered 4/9, 2018 at 11:32 Comment(2)
Why dont you iterate through all the 500 docs, update and and use the last doc key to construct startAt to create a new query?Kant
You can limit and then batch recursively, faced same issue and this was my solution: https://mcmap.net/q/452283/-firestore-cloud-function-to-recursively-update-subcollection-collectiongroupIlk
T
69

I also ran into the problem to update more than 500 documents inside a Firestore collection. And i would like to share how i solved this problem.

I use cloud functions to update my collection inside Firestore but this should also work on client side code.

The solution counts every operation which is made to the batch and after the limit is reached a new batch is created and pushed to the batchArray.

After all updates are completed the code loops through the batchArray and commits every batch which is inside the array.

It is important to count every operation set(), update(), delete() which is made to the batch because they all count to the 500 operation limit.

const documentSnapshotArray = await firestore.collection('my-collection').get();

const batchArray = [];
batchArray.push(firestore.batch());
let operationCounter = 0;
let batchIndex = 0;

documentSnapshotArray.forEach(documentSnapshot => {
    const documentData = documentSnapshot.data();

    // update document data here...

    batchArray[batchIndex].update(documentSnapshot.ref, documentData);
    operationCounter++;

    if (operationCounter === 499) {
      batchArray.push(firestore.batch());
      batchIndex++;
      operationCounter = 0;
    }
});

batchArray.forEach(async batch => await batch.commit());

return;
Tessellation answered 15/5, 2019 at 8:52 Comment(23)
how do u ensure that all the batches are executed successfully as only the operations within a batch are atomic. It would lead to data inconsistency if some batches executed and some didn'tLoomis
@Loomis Yes, you are right. I have left out the error handling part. I will add this part to the answer soon. I have updated my database to a new data model which was an idempotent operation in my case. So i could repeat the code until every batch succeeds.Tessellation
So there are couple of things which you can do. You can check the retry option when creating the cloud function. This will make sure your cloud function executes on any exception. But you will have to handle which failure you consider as transient else it will turn out to be an endless loop. Also some kind of state has to be maintained between cloud function executions so that the batches executed earlier aren't executed again. Maybe you can write to realtime database/firestore on every successful batch operation and carry on from there when some batch didn't in the next retryLoomis
Or you could write the job details (update details) to let's say /queue/pendingUpdates/ and write a cloud function which runs on a schedule (say every 5mins) which performs the updates. Once the operation is successful, you can delete/mark the job as completed. Else it retries automatically in the next interval. This is lot easier than the first one. Your thoughts?Loomis
I do not know your use case. Do you often write more than 500 documents?Tessellation
consider this scenerio: User details are denormalized into audit trail collection. When a user makes any changes, an entry is made to audit trail. When the user updates their profile photo, username, phone number or email, it has to be updated in all documents having the denormalized user data which eventually can exceed 500 documents countLoomis
I do not know your use case. Do you often write more than 500 documents? Maybe you could structure your data differently? Your solution with the state written to the database is ok but these writes could also fail and mess up your data. I would consider a solution with an query for not updated documents then as soon as a document is updated it is no longer in the query. You could repeat this until the query is empty. But this depends on your use case. If you know how the updated data should look like you could also use transactions.Tessellation
I prefer not to denormalize data in a noSQL database. I only have one or a few documents per user and all other users get the data from these few documents. This way you can scale your app properly if you have a lot of users. With denormalized data your app will be very inefficient.Tessellation
The reason data is denormalized is because the number of times the reads happen > number of times writes happen. In your case, you will have to fetch the user details again (2 reads instead of 1 per user). NoSQL encourages denormalization of data as well. Is there any reason you haven't denormalized the data? what happens when your user base grows or users start sharing the same document etc?Loomis
Let us continue this discussion in chat.Tessellation
@Sebe have you tested this for real life scenario? Does this creates new batch object whenever the batch write reach 500? ThanksSorrell
@Mihae Kheel Yes, the loop creates a new batch after it reaches 500 operations, but it is important to count every operation. Also you need some form of error handling.Tessellation
@SebastianVischer it seems the code and logic works fine when I used it. Thank you very muchSorrell
The answer is great but I had 29000 documents in a collection to be updated and this failed. Unfortunately, as there is no exception handling, I was getting the errors after 2-3 minutes and was difficult to find. Error with code 16 something. So I tweaked the logic a bit to have some gap between the batch commits (guess that worked.) Will try to add that as another answer. Thanks.Relume
@Relume I have never tried with this many documents. Maybe there is some kind of limit for commits. I like your solution with committing the batch after you reach 500 operation. In my opinion the simpler solution.Tessellation
@Loomis what about Promise.all(batchArray.map(batch => batch.commit()).then().catch(); ?Namhoi
The code can return before the commits complete. The batchArray.forEach() line could be: await Promise.all(batchArray.map(batch => batch.commit()));Povertystricken
@Povertystricken wouldn't the inner callback method also need "async/await" keywords, such as await Promise.all(batchArray.map(async (batch) => { await batch.commit(); }));Reidreidar
this answer explains it in greater detail #37577185Reidreidar
@Reidreidar No, batch.commit() returns a promise and Promise.all() waits on the array of promises. Adding async/await to the inner callback like you did would have Promise.all() wait on an array of undefined values. Either way though, all the batch.commit() calls resolve. So, in this case, either one might work, but the original is correct.Povertystricken
@Povertystricken oh thanks, that makes sense. But in the url that I provided, the example does use await inside the callback function, for a value which will be used in the next line. This callback function just automatically returns Promise<void>, right?Reidreidar
@Reidreidar Ah, you're right. Adding async/await to the inner callback would just add another layer of promises. Thanks for bringing this up. It had me understand it better.Povertystricken
Utility method for TypeScript: gist.github.com/wcoder/9bb44ffe709397f657864f6a404cf7ccPekan
D
29

I liked this simple solution:

const users = await db.collection('users').get()

const batches = _.chunk(users.docs, 500).map(userDocs => {
    const batch = db.batch()
    userDocs.forEach(doc => {
        batch.set(doc.ref, { field: 'myNewValue' }, { merge: true })
    })
    return batch.commit()
})

await Promise.all(batches)

Just remember to add import * as _ from "lodash" at the top. Based on this answer.

Dear answered 26/4, 2020 at 21:33 Comment(3)
"using typescript" ... I don't see any typescriptWendy
This should be part of the official documentation. Or at least something similar for no dependancy on lodash. Works like a charm! :)Stumpage
@MattFletcher loadash wroten in Vanilla JS if you want type support install @types/lodashHarv
L
8

You can use default BulkWriter. This method used 500/50/5 rule.

Example:

let bulkWriter = firestore.bulkWriter();

bulkWriter.create(documentRef, {foo: 'bar'});
bulkWriter.update(documentRef2, {foo: 'bar'});
bulkWriter.delete(documentRef3);
await close().then(() => {
  console.log('Executed all writes');
});
Lapel answered 8/11, 2021 at 19:27 Comment(0)
H
6

Since March 2023, Firestore no longer limits the number of writes that can be passed to a Commit operation or performed in a transaction (source).

Highminded answered 17/11, 2023 at 14:3 Comment(0)
R
4

As mentioned above, @Sebastian's answer is good and I upvoted that too. Although faced an issue while updating 25000+ documents in one go. The tweak to logic is as below.

console.log(`Updating documents...`);
let collectionRef = db.collection('cities');
try {
  let batch = db.batch();
  const documentSnapshotArray = await collectionRef.get();
  const records = documentSnapshotArray.docs;
  const index = documentSnapshotArray.size;
  console.log(`TOTAL SIZE=====${index}`);
  for (let i=0; i < index; i++) {
    const docRef = records[i].ref;
    // YOUR UPDATES
    batch.update(docRef, {isDeleted: false});
    if ((i + 1) % 499 === 0) {
      await batch.commit();
      batch = db.batch();
    }
  }
  // For committing final batch
  if (!(index % 499) == 0) {
    await batch.commit();
  }
  console.log('write completed');
} catch (error) {
  console.error(`updateWorkers() errored out : ${error.stack}`);
  reject(error);
}
Relume answered 19/3, 2021 at 15:24 Comment(0)
K
1

Explanations given on previous comments already explain the issue.

I'm sharing the final code that I built and worked for me, since I needed something that worked in a more decoupled manner, instead of the way that most of the solutions presented above do.

import { FireDb } from "@services/firebase"; // = firebase.firestore();

type TDocRef = FirebaseFirestore.DocumentReference;
type TDocData = FirebaseFirestore.DocumentData;

let fireBatches = [FireDb.batch()];
let batchSizes = [0];
let batchIdxToUse = 0;

export default class FirebaseUtil {
  static addBatchOperation(
    operation: "create",
    ref: TDocRef,
    data: TDocData
  ): void;
  static addBatchOperation(
    operation: "update",
    ref: TDocRef,
    data: TDocData,
    precondition?: FirebaseFirestore.Precondition
  ): void;
  static addBatchOperation(
    operation: "set",
    ref: TDocRef,
    data: TDocData,
    setOpts?: FirebaseFirestore.SetOptions
  ): void;
  static addBatchOperation(
    operation: "create" | "update" | "set",
    ref: TDocRef,
    data: TDocData,
    opts?: FirebaseFirestore.Precondition | FirebaseFirestore.SetOptions
  ): void {
    // Lines below make sure we stay below the limit of 500 writes per
    // batch
    if (batchSizes[batchIdxToUse] === 500) {
      fireBatches.push(FireDb.batch());
      batchSizes.push(0);
      batchIdxToUse++;
    }
    batchSizes[batchIdxToUse]++;

    const batchArgs: [TDocRef, TDocData] = [ref, data];
    if (opts) batchArgs.push(opts);

    switch (operation) {
      // Specific case for "set" is required because of some weird TS
      // glitch that doesn't allow me to use the arg "operation" to
      // call the function
      case "set":
        fireBatches[batchIdxToUse].set(...batchArgs);
        break;
      default:
        fireBatches[batchIdxToUse][operation](...batchArgs);
        break;
    }
  }

  public static async runBatchOperations() {
    // The lines below clear the globally available batches so we
    // don't run them twice if we call this function more than once
    const currentBatches = [...fireBatches];
    fireBatches = [FireDb.batch()];
    batchSizes = [0];
    batchIdxToUse = 0;

    await Promise.all(currentBatches.map((batch) => batch.commit()));
  }
}

Kendrakendrah answered 7/10, 2021 at 17:43 Comment(0)
P
1

Based on all the above answers, I put together the following pieces of code that one can put into a module in JavaScript back-end and front-end to easily use Firestore batch writes, without worrying about the 500 writes limit.

Back-end (Node.js)

// The Firebase Admin SDK to access Firestore.
const admin = require("firebase-admin");
admin.initializeApp();

// Firestore does not accept more than 500 writes in a transaction or batch write.
const MAX_TRANSACTION_WRITES = 499;

const isFirestoreDeadlineError = (err) => {
  console.log({ err });
  const errString = err.toString();
  return (
    errString.includes("Error: 13 INTERNAL: Received RST_STREAM") ||
    errString.includes("Error: 4 DEADLINE_EXCEEDED: Deadline exceeded")
  );
};

const db = admin.firestore();

// How many transactions/batchWrites out of 500 so far.
// I wrote the following functions to easily use batchWrites wthout worrying about the 500 limit.
let writeCounts = 0;
let batchIndex = 0;
let batchArray = [db.batch()];

// Commit and reset batchWrites and the counter.
const makeCommitBatch = async () => {
  console.log("makeCommitBatch");
  await Promise.all(batchArray.map((bch) => bch.commit()));
};

// Commit the batchWrite; if you got a Firestore Deadline Error try again every 4 seconds until it gets resolved.
const commitBatch = async () => {
  try {
    await makeCommitBatch();
  } catch (err) {
    console.log({ err });
    if (isFirestoreDeadlineError(err)) {
      const theInterval = setInterval(async () => {
        try {
          await makeCommitBatch();
          clearInterval(theInterval);
        } catch (err) {
          console.log({ err });
          if (!isFirestoreDeadlineError(err)) {
            clearInterval(theInterval);
            throw err;
          }
        }
      }, 4000);
    }
  }
};

//  If the batchWrite exeeds 499 possible writes, commit and rest the batch object and the counter.
const checkRestartBatchWriteCounts = () => {
  writeCounts += 1;
  if (writeCounts >= MAX_TRANSACTION_WRITES) {
    batchIndex++;
    batchArray.push(db.batch());
    writeCounts = 0;
  }
};

const batchSet = (docRef, docData) => {
  batchArray[batchIndex].set(docRef, docData);
  checkRestartBatchWriteCounts();
};

const batchUpdate = (docRef, docData) => {
  batchArray[batchIndex].update(docRef, docData);
  checkRestartBatchWriteCounts();
};

const batchDelete = (docRef) => {
  batchArray[batchIndex].delete(docRef);
  checkRestartBatchWriteCounts();
};

module.exports = {
  admin,
  db,
  MAX_TRANSACTION_WRITES,
  checkRestartBatchWriteCounts,
  commitBatch,
  isFirestoreDeadlineError,
  batchSet,
  batchUpdate,
  batchDelete,
};

Front-end

// Firestore does not accept more than 500 writes in a transaction or batch write.
const MAX_TRANSACTION_WRITES = 499;

const isFirestoreDeadlineError = (err) => {
  return (
    err.message.includes("DEADLINE_EXCEEDED") ||
    err.message.includes("Received RST_STREAM")
  );
};

class Firebase {
  constructor(fireConfig, instanceName) {
    let app = fbApp;
    if (instanceName) {
      app = app.initializeApp(fireConfig, instanceName);
    } else {
      app.initializeApp(fireConfig);
    }
    this.name = app.name;
    this.db = app.firestore();
    this.firestore = app.firestore;
    // How many transactions/batchWrites out of 500 so far.
    // I wrote the following functions to easily use batchWrites wthout worrying about the 500 limit.
    this.writeCounts = 0;
    this.batch = this.db.batch();
    this.isCommitting = false;
  }

  async makeCommitBatch() {
    console.log("makeCommitBatch");
    if (!this.isCommitting) {
      this.isCommitting = true;
      await this.batch.commit();
      this.writeCounts = 0;
      this.batch = this.db.batch();
      this.isCommitting = false;
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.isCommitting = true;
          await this.batch.commit();
          this.writeCounts = 0;
          this.batch = this.db.batch();
          this.isCommitting = false;
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }

  async commitBatch() {
    try {
      await this.makeCommitBatch();
    } catch (err) {
      console.log({ err });
      if (isFirestoreDeadlineError(err)) {
        const theInterval = setInterval(async () => {
          try {
            await this.makeCommitBatch();
            clearInterval(theInterval);
          } catch (err) {
            console.log({ err });
            if (!isFirestoreDeadlineError(err)) {
              clearInterval(theInterval);
              throw err;
            }
          }
        }, 4000);
      }
    }
  }

  async checkRestartBatchWriteCounts() {
    this.writeCounts += 1;
    if (this.writeCounts >= MAX_TRANSACTION_WRITES) {
      await this.commitBatch();
    }
  }

  async batchSet(docRef, docData) {
    if (!this.isCommitting) {
      this.batch.set(docRef, docData);
      await this.checkRestartBatchWriteCounts();
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.batch.set(docRef, docData);
          await this.checkRestartBatchWriteCounts();
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }

  async batchUpdate(docRef, docData) {
    if (!this.isCommitting) {
      this.batch.update(docRef, docData);
      await this.checkRestartBatchWriteCounts();
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.batch.update(docRef, docData);
          await this.checkRestartBatchWriteCounts();
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }

  async batchDelete(docRef) {
    if (!this.isCommitting) {
      this.batch.delete(docRef);
      await this.checkRestartBatchWriteCounts();
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.batch.delete(docRef);
          await this.checkRestartBatchWriteCounts();
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }
}
Paymaster answered 28/4, 2022 at 22:12 Comment(0)
S
1

No citations or documentation, this code i invented by myself and for me it worked and looks clean, and simple for read and usage. If some one like it, then can use it too.

Better make autotest becose code use private var _ops wich can be changed after packages upgrade. Forexample in old versions its can be _mutations

async function commitBatch(batch) {
  const MAX_OPERATIONS_PER_COMMIT = 500;

  while (batch._ops.length > MAX_OPERATIONS_PER_COMMIT) {
    const batchPart = admin.firestore().batch();

    batchPart._ops = batch._ops.splice(0, MAX_OPERATIONS_PER_COMMIT - 1);

    await batchPart.commit();
  }

  await batch.commit();
}

Usage:

const batch = admin.firestore().batch();

batch.delete(someRef);
batch.update(someRef);

...

await commitBatch(batch);
Sothena answered 10/11, 2022 at 7:49 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Precambrian
C
1

I like this implementation: https://github.com/qualdesk/firestore-big-batch

Here's a blog post about it (not mine): https://www.qualdesk.com/blog/2021/the-solution-to-firestore-batched-write-limit/

It's a drop-in replacement for Firestore's batch. Instead of this:

const batch = db.batch();

...do this:

const batch = new BigBatch({ db });

Here's my variation of it, which is updated to be type compatible with the latest firebase-admin and TypeScript. I also added a setGroup option, which ensures that a group of operations are part of the same batch.

// Inspired by: https://github.com/qualdesk/firestore-big-batch

import type {
  DocumentReference,
  Firestore,
  SetOptions,
  WriteBatch,
} from 'firebase-admin/firestore';

const MAX_OPERATIONS_PER_FIRESTORE_BATCH = 499;

export class BigBatch {
  private db: Firestore;
  private currentBatch: WriteBatch;
  private batchArray: Array<WriteBatch>;
  private operationCounter: number;

  constructor({ db }: { db: Firestore }) {
    this.db = db;
    this.currentBatch = db.batch();
    this.batchArray = [this.currentBatch];
    this.operationCounter = 0;
  }

  private startNewBatch() {
    this.currentBatch = this.db.batch();
    this.batchArray.push(this.currentBatch);
    this.operationCounter = 0;
  }

  private checkLimit() {
    if (this.operationCounter < MAX_OPERATIONS_PER_FIRESTORE_BATCH)
      return;

    this.startNewBatch();
  }

  private ensureGroupOperation(operations: unknown[]) {
    if (operations.length > MAX_OPERATIONS_PER_FIRESTORE_BATCH)
      throw new Error(
        `Group can only accept ${MAX_OPERATIONS_PER_FIRESTORE_BATCH} operations.`,
      );

    if (
      this.operationCounter + operations.length >
      MAX_OPERATIONS_PER_FIRESTORE_BATCH
    )
      this.startNewBatch();
  }

  /**
   * Add a single set operation to the batch.
   */
  set(
    ref: DocumentReference,
    data: object,
    options: SetOptions = {},
  ) {
    this.currentBatch.set(ref, data, options);
    this.operationCounter++;
    this.checkLimit();
  }

  /**
   * Add a group of set operations to the batch. This method ensures that everything in a group will be included in the same batch.
   * @param group Array of objects with ref, data, and options
   */
  setGroup(
operations: {
  ref: DocumentReference;
  data: object;
  options?: SetOptions;
}[],
  ) {
    this.ensureGroupOperation(operations);
    operations.forEach(o =>
      this.currentBatch.set(o.ref, o.data, o.options ?? {}),
    );
    this.operationCounter += operations.length;
    this.checkLimit();
  }

  update(ref: DocumentReference, data: object) {
    this.currentBatch.update(ref, data);
    this.operationCounter++;
    this.checkLimit();
  }

  delete(ref: DocumentReference) {
    this.currentBatch.delete(ref);
    this.operationCounter++;
    this.checkLimit();
  }

  commit() {
    const promises = this.batchArray.map(batch => batch.commit());
    return Promise.all(promises);
  }
}
Chesnut answered 20/7, 2023 at 1:4 Comment(2)
I love this! However, I'm getting a type error that I don't understand. import { getFirestore } from "firebase-admin/firestore"; const fs = getFirestore(); const batch = new BigBatch({fs}); the type error is: "Argument of type '{ fs: FirebaseFirestore.Firestore; }' is not assignable to parameter of type '{ db: Firestore; }'".Upbeat
Ah, I see. In my case I need to do const batch = new BigBatch({db: fs});. Thanks for the BigBatch class!Upbeat
I
0

Simple solution Just fire twice ? my array is "resultsFinal" I fire batch once with a limit of 490 , and second with a limit of the lenght of the array ( results.lenght) Works fine for me :) How you check it ? You go to firebase and delete your collection , firebase say you have delete XXX docs , same as the lenght of your array ? Ok so you are good to go

async function quickstart(results) {
    // we get results in parameter for get the data inside quickstart function
    const resultsFinal = results;
    // console.log(resultsFinal.length);
    let batch = firestore.batch();
    // limit of firebase is 500 requests per transaction/batch/send 
    for (i = 0; i < 490; i++) {
        const doc = firestore.collection('testMore490').doc();
        const object = resultsFinal[i];
        batch.set(doc, object);
    }
    await batch.commit();
    // const batchTwo = firestore.batch();
    batch = firestore.batch();

    for (i = 491; i < 776; i++) {
        const objectPartTwo = resultsFinal[i];
        const doc = firestore.collection('testMore490').doc();
        batch.set(doc, objectPartTwo);
    }
    await batch.commit();

}
Id answered 3/8, 2021 at 16:47 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.