I am seeking advice on what's the best way to design this -
Use Case
I want to put multiple files into S3. Once all files are successfully saved, I want to trigger a lambda function to do some other work.
Naive Approach
The way I am approaching this is by saving a record in Dynamo that contains a unique identifier and the total number of records I will be uploading along with the keys that should exist in S3.
A basic implementation would be to take my existing lambda function which is invoked anytime my S3 bucket is written into, and have it check manually whether all the other files been saved.
The Lambda function would know (look in Dynamo to determine what we're looking for) and query S3 to see if the other files are in. If so, use SNS to trigger my other lambda that will do the other work.
Edit: Another approach is have my client program that puts the files in S3 be responsible for directly invoking the other lambda function, since technically it knows when all the files have been uploaded. The issue with this approach is that I do not want this to be the responsibility of the client program... I want the client program to not care. As soon as it has uploaded the files, it should be able to just exit out.
Thoughts
I don't think this is a good idea. Mainly because Lambda functions should be lightweight, and polling the database from within the Lambda function to get the S3 keys of all the uploaded files and then checking in S3 if they are there - doing this each time seems ghetto and very repetitive.
What's the better approach? I was thinking something like using SWF but am not sure if that's overkill for my solution or if it will even let me do what I want. The documentation doesn't show real "examples" either. It's just a discussion without much of a step by step guide (perhaps I'm looking in the wrong spot).
Edit In response to mbaird's suggestions below-
Option 1 (SNS) This is what I will go with. It's simple and doesn't really violate the Single Responsibility Principal. That is, the client uploads the files and sends a notification (via SNS) that its work is done.
Option 2 (Dynamo streams) So this is essentially another "implementation" of Option 1. The client makes a service call, which in this case, results in a table update vs. a SNS notification (Option 1). This update would trigger the Lambda function, as opposed to notification. Not a bad solution, but I prefer using SNS for communication rather than relying on a database's capability (in this case Dynamo streams) to call a Lambda function.
In any case, I'm using AWS technologies and have coupling with their offering (Lambda functions, SNS, etc.) but I feel relying on something like Dynamo streams is making it an even tighter coupling. Not really a huge concern for my use case but still feels dirty ;D
Option 3 with S3 triggers My concern here is the possibility of race conditions. For example, if multiple files are being uploaded by the client simultaneously (think of several async uploads fired off at once with varying file sizes), what if two files happen to finish uploading at around the same time, and two or more Lambda functions (or whatever implementations we use) query Dynamo and gets back N as the completed uploads (instead of N and N+1)? Now even though the final result should be N+2, each one would add 1 to N. Nooooooooooo!
So Option 1 wins.