I have written a lambda that is triggered off s3 bucket to unzip a zip file and process a text document inside. Due to the limitation of memory of lambda i need to move my process over to something like AWS batch. Correct me if I am wrong but my work flow should look something like this.
I beleive I need to write a lambda to put the location of the s3 bucket on amazons SQS were a AWS batch can read the location and do all the unzipping/data processing their were their is more memory.
Here is my current lambda, it takes in the event triggered by the s3 bucket, checks to see if it is a zip file then pushes the name of that s3 Key to SQS. Should I tell AWS batch to start reading the queue here in my lambda? I am totally new to AWS in general and not sure were to go from here.
public class dockerEventHandler implements RequestHandler<S3Event, String> {
private static BigData app = new BigData();
private static DomainOfConstants CONST = new DomainOfConstants();
private static Logger log = Logger.getLogger(S3EventProcessorUnzip.class);
private static AmazonSQS SQS;
private static CreateQueueRequest createQueueRequest;
private static Matcher matcher;
private static String srcBucket, srcKey, extension, myQueueUrl;
@Override
public String handleRequest(S3Event s3Event, Context context)
{
try {
for (S3EventNotificationRecord record : s3Event.getRecords())
{
srcBucket = record.getS3().getBucket().getName();
srcKey = record.getS3().getObject().getKey().replace('+', ' ');
srcKey = URLDecoder.decode(srcKey, "UTF-8");
matcher = Pattern.compile(".*\\.([^\\.]*)").matcher(srcKey);
if (!matcher.matches())
{
log.info(CONST.getNoConnectionMessage() + srcKey);
return "";
}
extension = matcher.group(1).toLowerCase();
if (!"zip".equals(extension))
{
log.info("Skipping non-zip file " + srcKey + " with extension " + extension);
return "";
}
log.info("Sending object location to key" + srcBucket + "//" + srcKey);
//pass in only the reference of where the object is located
createQue(CONST.getQueueName(), srcKey);
}
}
catch (IOException e)
{
log.error(e);
}
return "Ok";
}
/*
*
* Setup connection to amazon SQS
* TODO - Find updated api for sqs connection to eliminate depreciation
*
* */
@SuppressWarnings("deprecation")
public static void sQSConnection() {
app.setAwsCredentials(CONST.getAccessKey(), CONST.getSecretKey());
try{
SQS = new AmazonSQSClient(app.getAwsCredentials());
Region usEast1 = Region.getRegion(Regions.US_EAST_1);
SQS.setRegion(usEast1);
}
catch(Exception e){
log.error(e);
}
}
//Create new Queue
public static void createQue(String queName, String message){
createQueueRequest = new CreateQueueRequest(queName);
myQueueUrl = SQS.createQueue(createQueueRequest).getQueueUrl();
sendMessage(myQueueUrl,message);
}
//Send reference to the s3 objects location to the queue
public static void sendMessage(String SIMPLE_QUE_URL, String S3KeyName){
SQS.sendMessage(new SendMessageRequest(SIMPLE_QUE_URL, S3KeyName));
}
//Fire AWS batch to pull from que
private static void initializeBatch(){
//TODO
}
I have setup docker and understand docker images. I believe my docker image should contain all the code to read the queue, unzip, process and kit the file to RDS all in one docker image/container.
I am looking for someone who has something similar done they could share to help. Something along the lines of :
Mr. S3: Hey lambda I have a file
Mr. Lambda :Okay S3 I see you, hey aws batch could you unzip and do stuff to this
Mr. Batch: Gotchya mr lambda, ill take care of that and put it in RDS or some data base after.
I have not written the class/docker image yet but i have all the code done to process/unzip and kick off to rds done. Lambda just is limited to memory due to some of the files being 1gb or bigger.