Deletion from amazon dynamodb
Asked Answered
S

8

21

Is there any efficient way to delete all the items from a amazon dynamodb tabe at once.I have gone through the aws docs but there it's shown deletion of a single item.

Semiramis answered 12/4, 2013 at 5:37 Comment(5)
Can you just delete the table? Otherwise, this might help: #9154764Kocher
Thanx alfredaday!!!! but every-time deleting and creating a same table will create overhead in my app...Semiramis
In addition, table creation isn't instantaneous. Be certain not to write to the new table until its status (read via describeTable) is "ACTIVE"Daiseydaisi
Can you describe more about your use case? What's the event in your application that requires truncating the table? Edit the question with your answers, as they're relevant to whatever answer you'll receive.Wager
Hey Steven!!!! actually I am using dynamodb for testing too, So after running test case db gets loaded with unwanted records which I need to delete, So that it can be used by the application.Thus deleting the whole table is bad idea....Semiramis
W
4

You will want to use BatchWriteItem if you can't drop the table. If all your entries are within a single HashKey, you can use the Query API to retrieve the records, and then delete them 25 items at a time. If not, you'll probably have to Scan.

Alternatively, you could provide a simple wrapper around AmazonDynamoDBClient (from the official SDK) that collects a Set of Hash/Range keys that exist in your table. Then you wouldn't need to Query or Scan for the items you inserted after the test, since you would already have the Set built. That would look something like this:

public class KeyCollectingAmazonDynamoDB implements AmazonDynamoDB
{
    private final AmazonDynamoDB delegate;
    // HashRangePair is something you have to define
    private final Set<Key> contents;

    public InsertGatheringAmazonDynamoDB( AmazonDynamoDB delegate )
    {
        this.delegate = delegate;
        this.contents = new HashSet<>();
    }

    @Override
    public PutItemResult putItem( PutItemRequest putItemRequest )
            throws AmazonServiceException, AmazonClientException
    {
        contents.add( extractKey( putItemRequest.getItem() ) );
        return delegate.putItem( putItemRequest );
    }

    private Key extractKey( Map<String, AttributeValue> item )
    {
        // TODO Define your hash/range key extraction here
        // Create a Key object
        return new Key( hashKey, rangeKey );
    }

    @Override
    public DeleteItemResult deleteItem( DeleteItemRequest deleteItemRequest )
            throws AmazonServiceException, AmazonClientException
    {
        contents.remove( deleteItemRequest.getKey() );
        return delegate.deleteItem( deleteItemRequest );
    }

    @Override
    public BatchWriteItemResult batchWriteItem( BatchWriteItemRequest batchWriteItemRequest )
            throws AmazonServiceException, AmazonClientException
    {
        // Similar extraction, but in bulk.
        for ( Map.Entry<String, List<WriteRequest>> entry : batchWriteItemRequest.getRequestItems().entrySet() )
        {
            String tableName = entry.getKey();
            List<WriteRequest> writeRequests = entry.getValue();
            for ( WriteRequest writeRequest : writeRequests )
            {
                PutRequest putRequest = writeRequest.getPutRequest();
                if ( putRequest != null )
                {
                    // Add to Set just like putItem
                }
                DeleteRequest deleteRequest = writeRequest.getDeleteRequest();
                if ( deleteRequest != null )
                {
                    // Remove from Set just like deleteItem
                }
            }
        }

        // Write through to DynamoDB
        return delegate.batchWriteItem( batchWriteItemRequest );
    }

    // remaining methods elided, since they're direct delegation
}

Key is a class within the DynamoDB SDK that accepts zero, one, or two AttributeValue objects in the constructor to represent a hash key or a hash/range key. Assuming it's equals and hashCode methods work, you can use in within the Set I described. If they don't, you'll have to write your own Key class.

This should get you a maintained Set for use within your tests. It's not specific to a table, so you might need to add another layer of collection if you're using multiple tables. That would change Set<Key> to something like Map<TableName, Set<Key>>. You would need to look at the getTableName() property to pick the correct Set to update.

Once your test finishes, grabbing the contents of the table and deleting should be straightforward.

One final suggestion: use a different table for testing than you do for your application. Create an identical schema, but give the table a different name. You probably even want a different IAM user to prevent your test code from accessing your production table. If you have questions about that, feel free to open a separate question for that scenario.

Wager answered 13/4, 2013 at 16:9 Comment(1)
This approach might be good for deleting the test data, but it is not ideal for real data specially when it scales. Its like keeping and maintaining another datastore of keys above each DynamoDB table.Ski
J
13

Do the following steps:

  1. Make delete table request
  2. In the response you will get the TableDescription
  3. Using TableDescription create the table again.

For step 1 and 2 click here

for step 3 click here

That's what I do in my application.

Jaymie answered 23/9, 2015 at 12:5 Comment(2)
DeleteTableRequest deleteTableRequest = new DeleteTableRequest() .withTableName("myTable"); DeleteTableResult result = client.deleteTable(deleteTableRequest);Caducous
How would this work, if table names must be unique, and the delete table request is not immediate? AWS puts the table in a "deleting" state, and actually takes awhile to do the deletion.Cassey
A
6

DynamoDBMapper will do the job in few lines :

AWSCredentials credentials = new PropertiesCredentials(credentialFile);
client = new AmazonDynamoDBClient(credentials);
DynamoDBMapper mapper = new DynamoDBMapper(this.client);
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
PaginatedScanList<LogData> result = mapper.scan(LogData.class,  scanExpression);
for (LogData data : result) {
    mapper.delete(data);
}
Alida answered 9/1, 2014 at 12:51 Comment(1)
scan is quite expensive; however using query is not as clean API calls-wiseHelms
G
6

As ihtsham says, the most efficient way is to delete and re-create the table. However, if that is not practical (e.g. due to complex configuration of the table, such as Lambda triggers), here are some AWS CLI commands to delete all records. They require the jq program for JSON processing.

Deleting records one-by-one (slow!), assuming your table is called my_table, your partition key is called partition_key, and your sort key (if any) is called sort_key:

aws dynamodb scan --table-name my_table | \
  jq -c '.Items[] | { partition_key, sort_key }' | \
  tr '\n' '\0' | \
  xargs -0 -n1 -t aws dynamodb delete-item --table-name my_table --key

Deleting records in batches of up to 25 records:

aws dynamodb scan --table-name my_table | \
  jq -c '[.Items | keys[] as $i | { index: $i, value: .[$i]}] | group_by(.index / 25 | floor)[] | { "my_table": [.[].value | { "DeleteRequest": { "Key": { partition_key, sort_key }}}] }' | \
  tr '\n' '\0' | \
  xargs -0 -n1 -t aws dynamodb batch-write-item --request-items

If you start seeing non-empty UnprocessedItems responses, your write capacity has been exceeded. You can account for this by reducing the batch size. For me, each batch takes about a second to submit, so with a write capacity of 5 per second, I set the batch size to 5.

Guild answered 14/11, 2016 at 16:14 Comment(0)
K
5

Just for the record, a quick solution with item-by-item delete in Python 3 (using Boto3 and scan()): (Credentials need to be set.)

def delete_all_items(table_name):
    # Deletes all items from a DynamoDB table.
    # You need to confirm your intention by pressing Enter.
    import boto3
    client = boto3.client('dynamodb')
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table(table_name)
    response = client.describe_table(TableName=table_name)
    keys = [k['AttributeName'] for k in response['Table']['KeySchema']]
    response = table.scan()
    items = response['Items']
    number_of_items = len(items)
    if number_of_items == 0:  # no items to delete
        print("Table '{}' is empty.".format(table_name))
        return
    print("You are about to delete all ({}) items from table '{}'."
          .format(number_of_items, table_name))
    input("Press Enter to continue...")
    with table.batch_writer() as batch:
        for item in items:
            key_dict = {k: item[k] for k in keys}
            print("Deleting " + str(item) + "...")
            batch.delete_item(Key=key_dict)

delete_all_items("test_table")

Obviously, this shouldn't be used for tables with a lot of items. (100+) For that, the delete / recreate approach is cheaper and more efficient.

Koziara answered 21/1, 2016 at 13:33 Comment(0)
W
4

You will want to use BatchWriteItem if you can't drop the table. If all your entries are within a single HashKey, you can use the Query API to retrieve the records, and then delete them 25 items at a time. If not, you'll probably have to Scan.

Alternatively, you could provide a simple wrapper around AmazonDynamoDBClient (from the official SDK) that collects a Set of Hash/Range keys that exist in your table. Then you wouldn't need to Query or Scan for the items you inserted after the test, since you would already have the Set built. That would look something like this:

public class KeyCollectingAmazonDynamoDB implements AmazonDynamoDB
{
    private final AmazonDynamoDB delegate;
    // HashRangePair is something you have to define
    private final Set<Key> contents;

    public InsertGatheringAmazonDynamoDB( AmazonDynamoDB delegate )
    {
        this.delegate = delegate;
        this.contents = new HashSet<>();
    }

    @Override
    public PutItemResult putItem( PutItemRequest putItemRequest )
            throws AmazonServiceException, AmazonClientException
    {
        contents.add( extractKey( putItemRequest.getItem() ) );
        return delegate.putItem( putItemRequest );
    }

    private Key extractKey( Map<String, AttributeValue> item )
    {
        // TODO Define your hash/range key extraction here
        // Create a Key object
        return new Key( hashKey, rangeKey );
    }

    @Override
    public DeleteItemResult deleteItem( DeleteItemRequest deleteItemRequest )
            throws AmazonServiceException, AmazonClientException
    {
        contents.remove( deleteItemRequest.getKey() );
        return delegate.deleteItem( deleteItemRequest );
    }

    @Override
    public BatchWriteItemResult batchWriteItem( BatchWriteItemRequest batchWriteItemRequest )
            throws AmazonServiceException, AmazonClientException
    {
        // Similar extraction, but in bulk.
        for ( Map.Entry<String, List<WriteRequest>> entry : batchWriteItemRequest.getRequestItems().entrySet() )
        {
            String tableName = entry.getKey();
            List<WriteRequest> writeRequests = entry.getValue();
            for ( WriteRequest writeRequest : writeRequests )
            {
                PutRequest putRequest = writeRequest.getPutRequest();
                if ( putRequest != null )
                {
                    // Add to Set just like putItem
                }
                DeleteRequest deleteRequest = writeRequest.getDeleteRequest();
                if ( deleteRequest != null )
                {
                    // Remove from Set just like deleteItem
                }
            }
        }

        // Write through to DynamoDB
        return delegate.batchWriteItem( batchWriteItemRequest );
    }

    // remaining methods elided, since they're direct delegation
}

Key is a class within the DynamoDB SDK that accepts zero, one, or two AttributeValue objects in the constructor to represent a hash key or a hash/range key. Assuming it's equals and hashCode methods work, you can use in within the Set I described. If they don't, you'll have to write your own Key class.

This should get you a maintained Set for use within your tests. It's not specific to a table, so you might need to add another layer of collection if you're using multiple tables. That would change Set<Key> to something like Map<TableName, Set<Key>>. You would need to look at the getTableName() property to pick the correct Set to update.

Once your test finishes, grabbing the contents of the table and deleting should be straightforward.

One final suggestion: use a different table for testing than you do for your application. Create an identical schema, but give the table a different name. You probably even want a different IAM user to prevent your test code from accessing your production table. If you have questions about that, feel free to open a separate question for that scenario.

Wager answered 13/4, 2013 at 16:9 Comment(1)
This approach might be good for deleting the test data, but it is not ideal for real data specially when it scales. Its like keeping and maintaining another datastore of keys above each DynamoDB table.Ski
U
0

You can recreate a DynamoDB table using AWS Java SDK

// Init DynamoDB client
AmazonDynamoDB dynamoDB = AmazonDynamoDBClientBuilder.standard().build();

// Get table definition
TableDescription tableDescription = dynamoDB.describeTable("my-table").getTable();

// Delete table
dynamoDB.deleteTable("my-table");

// Create table
CreateTableRequest createTableRequest = new CreateTableRequest()
        .withTableName(tableDescription.getTableName())
        .withAttributeDefinitions(tableDescription.getAttributeDefinitions())
        .withProvisionedThroughput(new ProvisionedThroughput()
                .withReadCapacityUnits(tableDescription.getProvisionedThroughput().getReadCapacityUnits())
                .withWriteCapacityUnits(tableDescription.getProvisionedThroughput().getWriteCapacityUnits())
        )
        .withKeySchema(tableDescription.getKeySchema());

dynamoDB.createTable(createTableRequest);
Uncharted answered 23/8, 2018 at 16:34 Comment(0)
B
0

I use following javascript code to do it:

async function truncate(table, keys) {

    const limit = (await db.describeTable({
        TableName: table
    }).promise()).Table.ProvisionedThroughput.ReadCapacityUnits;

    let total = 0;
    let lastEvaluatedKey = null;
    do {
        const qp = {
            TableName: table,
            Limit: limit,
            ExclusiveStartKey: lastEvaluatedKey,
            ProjectionExpression: keys.join(' '),
        };

        const qr = await ddb.scan(qp).promise();

        lastEvaluatedKey = qr.LastEvaluatedKey;

        const dp = {
            RequestItems: {
            },
        };

        dp.RequestItems[table] = [];

        if (qr.Items) {
            for (const i of qr.Items) {
                const dr = {
                    DeleteRequest: {
                        Key: {
                        }
                    }
                };

                keys.forEach(k => {
                    dr.DeleteRequest.Key[k] = i[k];
                });

                dp.RequestItems[table].push(dr);

                if (dp.RequestItems[table].length % 25 == 0) {
                    await ddb.batchWrite(dp).promise();
                    total += dp.RequestItems[table].length;
                    dp.RequestItems[table] = [];
                }
            }
            if (dp.RequestItems[table].length > 0) {
                await ddb.batchWrite(dp).promise();
                total += dp.RequestItems[table].length;
                dp.RequestItems[table] = [];
            }
        }

        console.log(`Deleted ${total}`);

        setTimeout(() => {}, 1000);

    } while (lastEvaluatedKey);
}

(async () => {
    truncate('table_name', ['id']);
})();
Bandung answered 19/1, 2019 at 23:48 Comment(0)
P
0

In this case, you may delete the table and create a new one.

Example:

from __future__ import print_function # Python 2/3 compatibility
import boto3

dynamodb = boto3.resource('dynamodb', region_name='us-west-2', endpoint_url="http://localhost:8000")

table = dynamodb.Table('Movies')

table.delete()
Pamilapammi answered 16/1, 2020 at 11:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.