Is it possible to run an airflow task only when a specific event occurs like an event of dropping a file into a specific S3 bucket. Something similar to AWS Lambda events
There is S3KeySensor
but I don't know if it does what I want (to run Task only when an event occurs)
Here is the example to make the question more clear:
I have a sensor object as follows
sensor = S3KeySensor(
task_id='run_on_every_file_drop',
bucket_key='file-to-watch-*',
wildcard_match=True,
bucket_name='my-sensor-bucket',
timeout=18*60*60,
poke_interval=120,
dag=dag
)
Using the above sensor object, airflow behavior for the sensor task is as follows:
- Runs the task if there is already an object name matching the
wildcard in the S3 bucket
my-sensor-bucket
even before the DAG is switchedON
in airflow admin UI (I don't want to run the task due to the presence of past s3 objects) - After running once, the sensor task will not run again whenever there
is a new S3 file object drop(I want to run the sensor task and subsequent tasks in the DAG every single time there is a new S3 file object dropped in the bucket
my-sensor-bucket
) - If you configure the scheduler, the tasks are run based on schedule but not based on event. So scheduler seems like not an option in this case
I'm trying to understand if tasks in airflow can be run only based on scheduling(like cron jobs) or sensors(only once based on sensing criteria) or cant it be setup like an event based pipeline(something similar to AWS Lambda)