Delete large data with same partition key from DynamoDB
Asked Answered
F

3

25

I have DynamoDB table structured like this

A   B    C    D
1   id1  foo hi
1   id2  var hello

A is the partition key and B is the sort key.

Let' say I only have the partition key and don't know the sort key and I'd like to delete all entries have the same partition key.

So I am thinking about loading entries by query with a fixed size (e.g 1000) and delete them in a batch until there are no more entries with the partition key left in DynamoDB.

Is it possible to delete entries without loading them first?

Federative answered 6/4, 2018 at 1:59 Comment(3)
The same question and a code example https://mcmap.net/q/539337/-how-to-delete-records-in-amazon-dynamodb-based-on-a-hashkeyMojave
is there a way to delete items with only hash key (without range key)?Federative
No. That is a surely missing feature. Hopefully in the future.Mojave
S
19

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteItem.html

DeleteItem

Deletes a single item in a table by primary key.

For the primary key, you must provide all of the attributes. For example, with a simple primary key, you only need to provide a value for the partition key. For a composite primary key, you must provide values for both the partition key and the sort key.

In order to delete an item you must provide the whole primary key (partition + sort key). So in your case you would need to query on the partition key, get all of the primary keys, then use those to delete each item. You can also use BatchWriteItem

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html

BatchWriteItem

The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB.

DeleteRequest - Perform a DeleteItem operation on the specified item. The item to be deleted is identified by a Key subelement: Key - A map of primary key attribute values that uniquely identify the item. Each entry in this map consists of an attribute name and an attribute value. For each primary key, you must provide all of the key attributes. For example, with a simple primary key, you only need to provide a value for the partition key. For a composite primary key, you must provide values for both the partition key and the sort key.

Scornik answered 6/4, 2018 at 8:41 Comment(2)
so the answer is no. For a composite primary key - there is no way to delete items with only hash (partition) keyFederative
@Federative this problem is usually designed by doing a begins_with="id" on the range keyGavrilla
N
11

No, but you can Query all the items for the partition, and then issue an individual DeleteRequest for each item, which you can batch in multiple BatchWrite calls of up to 25 items.

JS code

async function deleteItems(tableName, partitionId ) {
  
  const queryParams = {
    TableName: tableName,
    KeyConditionExpression: 'partitionId = :partitionId',
    ExpressionAttributeValues: { ':partitionId': partitionId } ,
  };

  const queryResults = await docClient.query(queryParams).promise()
  if (queryResults.Items && queryResults.Items.length > 0) {
    
    const batchCalls = chunks(queryResults.Items, 25).map( async (chunk) => {
      const deleteRequests = chunk.map( item => {
        return {
          DeleteRequest : {
            Key : {
              'partitionId' : item.partitionId,
              'sortId' : item.sortId,

            }
          }
        }
      })

      const batchWriteParams = {
        RequestItems : {
          [tableName] : deleteRequests
        }
      }
      await docClient.batchWrite(batchWriteParams).promise()
    })

    await Promise.all(batchCalls)
  }
}

// https://mcmap.net/q/53160/-split-array-into-chunks
function chunks(inputArray, perChunk) {
  return inputArray.reduce((all,one,i) => {
    const ch = Math.floor(i/perChunk); 
    all[ch] = [].concat((all[ch]||[]),one); 
    return all
 }, [])
}
Nappe answered 17/10, 2020 at 16:39 Comment(0)
S
2

For production databases and critical Amazon DynamoDB tables, recommendation is to use batch-write-item to purge huge data.

batch-write-item (with DeleteRequest) is 10 to 15 times faster than delete-item.

aws dynamodb scan --table-name "test_table_name" --projection-expression "primary_key, timestamp" --filter-expression "timestamp < :oldest_date" --expression-attribute-values '{":oldest_date":{"S":"2020-02-01"}}' --max-items 25 --total-segments "$TOTAL_SEGMENT" --segment "$SEGMENT_NUMBER" > $SCAN_OUTPUT_FILE

cat $SCAN_OUTPUT_FILE | jq -r ".Items[] | tojson" | awk '{ print "{\"DeleteRequest\": {\"Key\": " $0 " }}," }' | sed '$ s/.$//' | sed '1 i { "test_table_name": [' | sed '$ a ] }' > $INPUT_FILE

aws dynamodb batch-write-item --request-items file://$INPUT_FILE

Please find more information @ https://medium.com/analytics-vidhya/how-to-delete-huge-data-from-dynamodb-table-f3be586c011c

Sapanwood answered 3/7, 2020 at 6:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.