How to delete all datastore in Google App Engine?
Asked Answered
D

30

124

Does anyone know how to delete all datastore in Google App Engine?

Danyluk answered 30/6, 2009 at 8:52 Comment(4)
db.delete(db.Query(keys_only=True)). Further details here https://mcmap.net/q/180069/-how-to-delete-all-datastore-in-google-app-engine.Gilroy
As pointed out by @systempuntoout below, GAE now has a Datastore Admin that lets you delete entities in bulk without any coding, among other things. That feature needs to be surfaced here rather than buried in the 3rd comment.Mucous
The Datastore Admin doesn't work (the page loads an iframe to a non-existent host), so we'd still need to use the db.delete method.Irradiation
To delete all data on development server, issue the following on cmd prompt: /path/to/google_appengine/dev_appserver.py --clear_datastore yes myappname/ where myappname is your directory containing your app.yaml file for the app.. you need to cd to this directory path.. credit: Steven Almeroth and Melllvar for answer belowKendy
T
69

If you're talking about the live datastore, open the dashboard for your app (login on appengine) then datastore --> dataviewer, select all the rows for the table you want to delete and hit the delete button (you'll have to do this for all your tables). You can do the same programmatically through the remote_api (but I never used it).

If you're talking about the development datastore, you'll just have to delete the following file: "./WEB-INF/appengine-generated/local_db.bin". The file will be generated for you again next time you run the development server and you'll have a clear db.

Make sure to clean your project afterwards.

This is one of the little gotchas that come in handy when you start playing with the Google Application Engine. You'll find yourself persisting objects into the datastore then changing the JDO object model for your persistable entities ending up with obsolete data that'll make your app crash all over the place.

Tribadism answered 30/6, 2009 at 8:55 Comment(5)
There's a -c parameter to the dev_appserver.py to delete from the development datastore.Hyracoid
@Hyracoid But that only applies to the Python app engine. Does anybody know how a shortcut for doing it in Java? (In the meantime, JohnIdol's suggestion works well.)Theodicy
Thanks @John: Where the exact path in MAC OSX?Misdemeanant
Where is the path in Windows?Kempf
@ShaneBest the path in windows is something like ./target/yourappid-1.0-SNAPSHOT/WEB-INF/appengine-generated/local_db.binSarcous
C
57

The best approach is the remote API method as suggested by Nick, he's an App Engine engineer from Google, so trust him.

It's not that difficult to do, and the latest 1.2.5 SDK provides the remote_shell_api.py out of the shelf. So go to download the new SDK. Then follow the steps:

  • connect remote server in your commandline: remote_shell_api.py yourapp /remote_api The shell will ask for your login info, and if authorized, will make a Python shell for you. You need setup url handler for /remote_api in your app.yaml

  • fetch the entities you'd like to delete, the code looks something like:

    from models import Entry
    query = Entry.all(keys_only=True)
    entries =query.fetch(1000)
    db.delete(entries)
    \# This could bulk delete 1000 entities a time

Update 2013-10-28:

  • remote_shell_api.py has been replaced by remote_api_shell.py, and you should connect with remote_api_shell.py -s your_app_id.appspot.com, according to the documentation.

  • There is a new experimental feature Datastore Admin, after enabling it in app settings, you can bulk delete as well as backup your datastore through the web ui.

Capello answered 4/9, 2009 at 9:37 Comment(8)
Actually, you don't need the fetch. Just db.delete(Entry.all()) will do it.Unimproved
You need to do this in 500 entity sets or else you'll get: BadRequestError: cannot delete more than 500 entities in a single callWholewheat
Just an FYI, for you to use the remote api you need to enable it in your application first using builtins:- remote_api: in your YAML file. more info is at developers.google.com/appengine/articles/remote_apiCossack
At least add the 'keys_only=True' when you call Entry.all(). There's no need to fetch the whole entry if you don't need to check the data. Else you're just wasting computing cycles.Gilroy
Thanks Evan, I've added keys_only=True as you suggested.Capello
+1 ... but: As of 2013, remote_shell_api.py doesn't exist. The current script name is remote_api_shell.py. Also, if you use ndb (which is what most people do these days), recommended way to use ndb.delete_multi(model.Entry.query().fetch(keys_only=True))Friar
@Juvenn, one more question: Is there any way to automate remote_api_shell, so i can implement a script (instead of the interactive shell).Friar
@Uri Have a look at the remote_api_shell.py that come from the SDK, it could be a good demo of how should your script communicate with remote datastore. Sorry for the late reply, and I have not played with appengine much recently.Capello
L
27

The fastest and efficient way to handle bulk delete on Datastore is by using the new mapper API announced on the latest Google I/O.

If your language of choice is Python, you just have to register your mapper in a mapreduce.yaml file and define a function like this:

from mapreduce import operation as op
def process(entity):
 yield op.db.Delete(entity)

On Java you should have a look to this article that suggests a function like this:

@Override
public void map(Key key, Entity value, Context context) {
    log.info("Adding key to deletion pool: " + key);
    DatastoreMutationPool mutationPool = this.getAppEngineContext(context)
            .getMutationPool();
    mutationPool.delete(value.getKey());
}

EDIT:
Since SDK 1.3.8, there's a Datastore admin feature for this purpose

Lenard answered 1/9, 2010 at 18:57 Comment(0)
A
27

You can clear the development server datastore when you run the server:

/path/to/dev_appserver.py --clear_datastore=yes myapp

You can also abbreviate --clear_datastore with -c.

Admonish answered 3/9, 2010 at 9:54 Comment(3)
Not sure if it's a recent thing, but the actual syntax is now /path/to/google_appengine/dev_appserver.py --clear_datastore yes myappname/ (note the 'yes')Allinclusive
It is the most useful way of repeatedly deleting the datastore during development. With options getting obsolete fast, it worth to highlight this flag is still in place in july 2018, and works for dev_appserver installed via gcloud CLITips
In version 270.0.0 of the Google Cloud SDK "--clear_datastore=yes" still works with the equal signColwen
L
15

If you have a significant amount of data, you need to use a script to delete it. You can use remote_api to clear the datastore from the client side in a straightforward manner, though.

Lomeli answered 30/6, 2009 at 11:20 Comment(0)
C
11

Here you go: Go to Datastore Admin, and then select the Entity type you want to delete and click Delete. Mapreduce will take care of deleting!

Champ answered 9/12, 2011 at 11:58 Comment(1)
As of May'22, I no longer see this option:Hoffer
B
10

There are several ways you can use to remove entries from App Engine's Datastore:

enter image description here

  1. First, think whether you really need to remove entries. This is expensive and it might be cheaper to not remove them.

  2. You can delete all entries by hand using the Datastore Admin.

  3. You can use the Remote API and remove entries interactively.

  4. You can remove the entries programmatically using a couple lines of code.

  5. You can remove them in bulk using Task Queues and Cursors.

  6. Or you can use Mapreduce to get something more robust and fancier.

Each one of these methods is explained in the following blog post: http://www.shiftedup.com/2015/03/28/how-to-bulk-delete-entries-in-app-engine-datastore

Hope it helps!

Bragi answered 28/3, 2015 at 13:47 Comment(0)
R
6

The zero-setup way to do this is to send an execute-arbitrary-code HTTP request to the admin service that your running app already, automatically, has:

import urllib
import urllib2

urllib2.urlopen('http://localhost:8080/_ah/admin/interactive/execute',
    data = urllib.urlencode({'code' : 'from google.appengine.ext import db\n' +
                                      'db.delete(db.Query())'}))
Rebut answered 24/1, 2011 at 17:41 Comment(1)
This only works for the development server. Is there a production equivalent?Aldebaran
A
3

You can do it using the web interface. Login into your account, navigate with links on the left hand side. In Data Store management you have options to modify and delete data. Use respective options.

Annal answered 30/6, 2009 at 9:10 Comment(0)
R
3

Source

I got this from http://code.google.com/appengine/articles/remote_api.html.

Create the Interactive Console

First, you need to define an interactive appenginge console. So, create a file called appengine_console.py and enter this:

#!/usr/bin/python
import code
import getpass
import sys

# These are for my OSX installation. Change it to match your google_appengine paths. sys.path.append("/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine")
sys.path.append("/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/yaml/lib")

from google.appengine.ext.remote_api import remote_api_stub
from google.appengine.ext import db

def auth_func():
  return raw_input('Username:'), getpass.getpass('Password:')

if len(sys.argv) < 2:
  print "Usage: %s app_id [host]" % (sys.argv[0],)
app_id = sys.argv[1]
if len(sys.argv) > 2:
  host = sys.argv[2]
else:
  host = '%s.appspot.com' % app_id

remote_api_stub.ConfigureRemoteDatastore(app_id, '/remote_api', auth_func, host)

code.interact('App Engine interactive console for %s' % (app_id,), None, locals())



Create the Mapper base class

Once that's in place, create this Mapper class. I just created a new file called utils.py and threw this:

class Mapper(object):
  # Subclasses should replace this with a model class (eg, model.Person).
  KIND = None

  # Subclasses can replace this with a list of (property, value) tuples to filter by.
  FILTERS = []

  def map(self, entity):
    """Updates a single entity.

    Implementers should return a tuple containing two iterables (to_update, to_delete).
    """
    return ([], [])

  def get_query(self):
    """Returns a query over the specified kind, with any appropriate filters applied."""
    q = self.KIND.all()
    for prop, value in self.FILTERS:
      q.filter("%s =" % prop, value)
    q.order("__key__")
    return q

  def run(self, batch_size=100):
    """Executes the map procedure over all matching entities."""
    q = self.get_query()
    entities = q.fetch(batch_size)
    while entities:
      to_put = []
      to_delete = []
      for entity in entities:
        map_updates, map_deletes = self.map(entity)
        to_put.extend(map_updates)
        to_delete.extend(map_deletes)
      if to_put:
        db.put(to_put)
      if to_delete:
        db.delete(to_delete)
      q = self.get_query()
      q.filter("__key__ >", entities[-1].key())
      entities = q.fetch(batch_size)

Mapper is supposed to be just an abstract class that allows you to iterate over every entity of a given kind, be it to extract their data, or to modify them and store the updated entities back to the datastore.

Run with it!

Now, start your appengine interactive console:

$python appengine_console.py <app_id_here>

That should start the interactive console. In it create a subclass of Model:

from utils import Mapper
# import your model class here 
class MyModelDeleter(Mapper):
    KIND = <model_name_here>

    def map(self, entity):
        return ([], [entity])

And, finally, run it (from you interactive console): mapper = MyModelDeleter() mapper.run()

That's it!

Reno answered 1/11, 2009 at 20:13 Comment(0)
A
3

I've created an add-in panel that can be used with your deployed App Engine apps. It lists the kinds that are present in the datastore in a dropdown, and you can click a button to schedule "tasks" that delete all entities of a specific kind or simply everything. You can download it here:
http://code.google.com/p/jobfeed/wiki/Nuke

Aleece answered 30/3, 2010 at 22:5 Comment(0)
E
3

For Python, 1.3.8 includes an experimental admin built-in for this. They say: "enable the following builtin in your app.yaml file:"

builtins:
- datastore_admin: on

"Datastore delete is currently available only with the Python runtime. Java applications, however, can still take advantage of this feature by creating a non-default Python application version that enables Datastore Admin in the app.yaml. Native support for Java will be included in an upcoming release."

Escapism answered 18/1, 2011 at 16:54 Comment(1)
Adding the configuration in app.yaml threw an error. Instead we can enable it from the 'Applications Setting' Page in 'Administration' section. There's a button to enable itFerrule
M
3

Open "Datastore Admin" for your application and enable Admin. Then all of your entities will be listed with check boxes. You can simply select the unwanted entites and delete them.

Marceau answered 15/11, 2011 at 19:3 Comment(0)
G
3

This is what you're looking for...

db.delete(Entry.all(keys_only=True))

Running a keys-only query is much faster than a full fetch, and your quota will take a smaller hit because keys-only queries are considered small ops.

Here's a link to an answer from Nick Johnson describing it further.

Below is an end-to-end REST API solution to truncating a table...

I setup a REST API to handle database transactions where routes are directly mapped through to the proper model/action. This can be called by entering the right url (example.com/inventory/truncate) and logging in.

Here's the route:

Route('/inventory/truncate', DataHandler, defaults={'_model':'Inventory', '_action':'truncate'})

Here's the handler:

class DataHandler(webapp2.RequestHandler):
  @basic_auth
  def delete(self, **defaults):
    model = defaults.get('_model')
    action = defaults.get('_action')
    module = __import__('api.models', fromlist=[model])
    model_instance = getattr(module, model)()
    result = getattr(model_instance, action)()

It starts by loading the model dynamically (ie Inventory found under api.models), then calls the correct method (Inventory.truncate()) as specified in the action parameter.

The @basic_auth is a decorator/wrapper that provides authentication for sensitive operations (ie POST/DELETE). There's also an oAuth decorator available if you're concerned about security.

Finally, the action is called:

def truncate(self):
  db.delete(Inventory.all(keys_only=True))

It looks like magic but it's actually very straightforward. The best part is, delete() can be re-used to handle deleting one-or-many results by adding another action to the model.

Gilroy answered 1/6, 2012 at 19:54 Comment(0)
C
3

You can Delete All Datastore by deleting all Kinds One by One. with google appengine dash board. Please follow these Steps.

  1. Login to https://console.cloud.google.com/datastore/settings
  2. Click Open Datastore Admin. (Enable it if not enabled.)
  3. Select all Entities and press delete.(This Step run a map reduce job for deleting all selected Kinds.)

for more information see This image http://storage.googleapis.com/bnifsc/Screenshot%20from%202015-01-31%2023%3A58%3A41.png

Censer answered 31/1, 2015 at 18:41 Comment(1)
FYI permission is denied on the image you attached. Also, is this for “Firestore in Datastore mode”, or just the old “Datastore” product which has now been migrated to Firestore?Baileybailie
A
2

If you have a lot of data, using the web interface could be time consuming. The App Engine Launcher utility lets you delete everything in one go with the 'Clear datastore on launch' checkbox. This utility is now available for both Windows and Mac (Python framework).

Attempt answered 29/11, 2009 at 17:5 Comment(0)
L
2

For the development server, instead of running the server through the google app engine launcher, you can run it from the terminal like:

dev_appserver.py --port=[portnumber] --clear_datastore=yes [nameofapplication]

ex: my application "reader" runs on port 15080. After modify the code and restart the server, I just run "dev_appserver.py --port=15080 --clear_datastore=yes reader".

It's good for me.

Ladyship answered 12/7, 2013 at 14:51 Comment(0)
A
2

Adding answer about recent developments.

Google recently added datastore admin feature. You can backup, delete or copy your entities to another app using this console.

https://developers.google.com/appengine/docs/adminconsole/datastoreadmin#Deleting_Entities_in_Bulk

Alto answered 2/10, 2013 at 9:1 Comment(0)
D
1

I often don't want to delete all the data store so I pull a clean copy of /war/WEB-INF/local_db.bin out source control. It may just be me but it seems even with the Dev Mode stopped I have to physically remove the file before pulling it. This is on Windows using the subversion plugin for Eclipse.

Dafodil answered 14/11, 2011 at 4:51 Comment(0)
A
1

As of 2022, there are two ways to delete a kind from a (largeish) datastore to the best of my knowledge. Google recommends using a Dataflow template. The template will basically pull each entity one by one subject to a GQL query, and then delete it. Interestingly, if you are deleting a large number of rows (> 10m), you will run into datastore troubles; as it will fail to provide enough capacity, and your operations to the datastore will start timing out. However, only the kind you are mass deleting from will be effected.

If you have less than 10m rows, you can just use this go script:

import (
    "cloud.google.com/go/datastore"
    "context"
    "fmt"
    "google.golang.org/api/option"
    "log"
    "strings"
    "sync"
    "time"
)

const (
    batchSize       = 10000 // number of keys to get in a single batch
    deleteBatchSize = 500   // number of keys to delete in a single batch
    projectID       = "name-of-your-GCP-project"
    serviceAccount  = "path-to-sa-file"
    table           = "kind-to-delete"
)

func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

func deleteBatch(table string) int {

    ctx := context.Background()
    client, err := datastore.NewClient(ctx, projectID, option.WithCredentialsFile(serviceAccount))
    if err != nil {
        log.Fatalf("Failed to open client: %v", err)
    }
    defer client.Close()
        
    query := datastore.NewQuery(table).KeysOnly().Limit(batchSize)

    keys, err := client.GetAll(ctx, query, nil)
    if err != nil {
        fmt.Printf("%s Failed to get %d keys : %v\n", table, batchSize, err)
        return -1
    }

    var wg sync.WaitGroup
    for i := 0; i < len(keys); i += deleteBatchSize {
        wg.Add(1)
        go func(i int) {
            batch := keys[i : i+min(len(keys)-i, deleteBatchSize)]
            if err := client.DeleteMulti(ctx, batch); err != nil {
                // not a big problem, we'll get them next time ;)
                fmt.Printf("%s Failed to delete multi: %v", table, err)
            }
            wg.Done()
        }(i)
    }

    wg.Wait()
    return len(keys)
}

func main() {

    var globalStartTime = time.Now()

    fmt.Printf("Deleting \033[1m%s\033[0m\n", table)
    for {
        startTime := time.Now()
        count := deleteBatch(table)
        if count >= 0 {
            rate := float64(count) / time.Since(startTime).Seconds()
            fmt.Printf("Deleted %d keys from %s in %.2fs, rate %.2f keys/s\n", count, table, time.Since(startTime).Seconds(), rate)
            if count == 0 {
                fmt.Printf("%s is now clear.\n", table)
                break
            }
        } else {
            fmt.Printf("Retrying after short cooldown\n")
            time.Sleep(10 * time.Second)
        }
    }

    fmt.Printf("Total time taken %s.\n", time.Since(globalStartTime))
}
Ananias answered 1/7, 2022 at 22:16 Comment(0)
C
0

I was so frustrated about existing solutions for deleting all data in the live datastore that I created a small GAE app that can delete quite some amount of data within its 30 seconds.

How to install etc: https://github.com/xamde/xydra

Corrales answered 10/9, 2010 at 0:43 Comment(0)
T
0

PHP variation:

import com.google.appengine.api.datastore.Query;
import com.google.appengine.api.datastore.DatastoreServiceFactory;

define('DATASTORE_SERVICE', DatastoreServiceFactory::getDatastoreService());

function get_all($kind) {
    $query = new Query($kind);
    $prepared = DATASTORE_SERVICE->prepare($query);
    return $prepared->asIterable();
}

function delete_all($kind, $amount = 0) {
    if ($entities = get_all($kind)) {
        $r = $t = 0;
        $delete = array();
        foreach ($entities as $entity) {
            if ($r < 500) {
                $delete[] = $entity->getKey();
            } else {
                DATASTORE_SERVICE->delete($delete);
                $delete = array();
                $r = -1;
            }
            $r++; $t++;
            if ($amount && $amount < $t) break;
        }
        if ($delete) {
            DATASTORE_SERVICE->delete($delete);
        }
    }
}

Yes it will take time and 30 sec. is a limit. I'm thinking to put an ajax app sample to automate beyond 30 sec.

Trustee answered 9/2, 2011 at 12:9 Comment(1)
This isn't even valid php. import? Defining a constant as an object instance?Inequity
K
0
for amodel in db.Model.__subclasses__():
                dela=[]
                print amodel
                try:
                    m = amodel()
                    mq = m.all()
                    print mq.count()
                    for mw in mq:
                        dela.append(mw)
                    db.delete(dela)
            #~ print len(dela)

                except:
                    pass
Kilogram answered 14/9, 2011 at 10:25 Comment(0)
G
0

If you're using ndb, the method that worked for me for clearing the datastore:

ndb.delete_multi(ndb.Query(default_options=ndb.QueryOptions(keys_only=True)))
Gigantes answered 20/5, 2014 at 9:31 Comment(1)
I don't think this will work. Appengine complains about Sorry, unexpected error: The kind "__Stat_Kind__" is reserved. This seems like appengine has some internal statistics entity that can be exposed by this method (possible bug on their end?)Alysa
T
0

For any datastore that's on app engine, rather than local, you can use the new Datastore API. Here's a primer for how to get started.

I wrote a script that deletes all non-built in entities. The API is changing pretty rapidly, so for reference, I cloned it at commit 990ab5c7f2063e8147bcc56ee222836fd3d6e15b

from gcloud import datastore
from gcloud.datastore import SCOPE
from gcloud.datastore.connection import Connection
from gcloud.datastore import query

from oauth2client import client

def get_connection():
  client_email = '[email protected]'
  private_key_string = open('/path/to/yourfile.p12', 'rb').read()

  svc_account_credentials = client.SignedJwtAssertionCredentials(
    service_account_name=client_email,
    private_key=private_key_string,
    scope=SCOPE)

  return Connection(credentials=svc_account_credentials)


def connect_to_dataset(dataset_id):
  connection = get_connection()
  datastore.set_default_connection(connection)
  datastore.set_default_dataset_id(dataset_id)

if __name__ == "__main__":
  connect_to_dataset(DATASET_NAME)
  gae_entity_query = query.Query()
  gae_entity_query.keys_only()
  for entity in gae_entity_query.fetch():
    if entity.kind[0] != '_':
      print entity.kind
      entity.key.delete()
Trichocyst answered 13/1, 2015 at 19:44 Comment(0)
B
0
  • continuing the idea of svpino it is wisdom to reuse records marked as delete. (his idea was not to remove, but mark as "deleted" unused records). little bit of cache/memcache to handle working copy and write only difference of states (before and after desired task) to datastore will make it better. for big tasks it is possible to write itermediate difference chunks to datastore to avoid data loss if memcache disappeared. to make it loss-proof it is possible to check integrity/existence of memcached results and restart task (or required part) to repeat missing computations. when data difference is written to datastore, required computations are discarded in queue.

  • other idea similar to map reduced is to shard entity kind to several different entity kinds, so it will be collected together and visible as single entity kind to final user. entries are only marked as "deleted". when "deleted" entries amount per shard overcomes some limit, "alive" entries are distributed between other shards, and this shard is closed forever and then deleted manually from dev console (guess at less cost) upd: seems no drop table at console, only delete record-by-record at regular price.

  • it is possible to delete by query by chunks large set of records without gae failing (at least works locally) with possibility to continue in next attempt when time is over:


    qdelete.getFetchPlan().setFetchSize(100);

    while (true)
    {
        long result = qdelete.deletePersistentAll(candidates);
        LOG.log(Level.INFO, String.format("deleted: %d", result));
        if (result <= 0)
            break;
    }
  • also sometimes it useful to make additional field in primary table instead of putting candidates (related records) into separate table. and yes, field may be unindexed/serialized array with little computation cost.
Botsford answered 28/5, 2015 at 15:40 Comment(0)
R
0

For all people that need a quick solution for the dev server (as time of writing in Feb. 2016):

  1. Stop the dev server.
  2. Delete the target directory.
  3. Rebuild the project.

This will wipe all data from the datastore.

Renaerenaissance answered 18/2, 2016 at 19:44 Comment(0)
I
0

For java

DatastoreService db = DatastoreServiceFactory.getDatastoreService();
List<Key> keys = new ArrayList<Key>();
for(Entity e : db.prepare(new Query().setKeysOnly()).asIterable())
    keys.add(e.getKey());
db.delete(keys);

Works well in Development Server

Ingrained answered 29/2, 2016 at 18:52 Comment(1)
FYI this will run into the Firestore limitation of not being able to do multi operations on more than 500 entities at a time. So this solution is not complete without some kind of pagination.Baileybailie
A
0

You have 2 simple ways,

#1: To save cost, delete the entire project

#2: using ts-datastore-orm:

https://www.npmjs.com/package/ts-datastore-orm await Entity.truncate(); The truncate can delete around 1K rows per seconds

Ajax answered 29/2, 2020 at 2:38 Comment(0)
R
0

Here's how I did this naively from a vanilla Google Cloud Shell (no GAE) with python3:

from google.cloud import datastore
client = datastore.Client()
query.keys_only()
for counter, entity in enumerate(query.fetch()):
    if entity.kind.startswith('_'):  # skip reserved kinds
        continue
    print(f"{counter}: {entity.key}")
    client.delete(entity.key)

This takes a very long time even with a relatively small amount of keys but it works.

More info about the Python client library: https://googleapis.dev/python/datastore/latest/client.html

Rutaceous answered 23/10, 2020 at 22:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.