Why do we need airflow hooks?
Asked Answered
K

3

14

Doc says:

Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig. Hooks implement a common interface when possible, and act as a building block for operators. Ref

But why do we need them?

I want to select data from one Postgres DB, and store to another one. Can I use, for example, psycopg2 driver inside python script, which runs by a python operator, or airflow should know for some reason what exactly I'm doing inside script, so, I need to use PostgresHook instead of just psycopg2 driver?

Krick answered 22/6, 2020 at 11:42 Comment(1)
could u add the tags and mention the framework, language you are usingHagfish
S
7

You should use just PostresHook. Instead of using psycopg2 as so:

conn = f'{pass}:{server}@host etc}'
cur = conn.cursor()
cur.execute(query)
data = cur.fetchall()

You can just type:

postgres = PostgresHook('connection_id')
data = postgres.get_pandas_df(query)

Which can also make use of encryption of connections.

So using hooks is cleaner, safer and easier.

Suggestion answered 23/6, 2020 at 8:52 Comment(3)
"Easier" is a matter of opinion and experience. psycopg2 has better documentation, a more stable API and will be already familiar to many people. "Cleaner" is also debatable: your examples are not equivalent as you omit the setup for the Airflow hook. Maybe you could elaborate on why hooks are safer?Haga
@Haga the Postgres hook in Airflow uses the psycopg2 driver...Etam
> All problems in computer science can be solved by another level of indirection (by Butler Lampson)Jugum
K
4

While it is possible to just hardcode the connections in your script and run it, the power of hooks will allow to edit environment variables from within the UI.

Have a look at "Automate AWS Tasks Thanks to Airflow Hooks" to learn a bit more about how to use hooks.

Koerlin answered 22/6, 2020 at 12:31 Comment(1)
Link is broken, please update it.Motherless
E
0

You don't 'need' them, they are provided for you by Airflow as a convenience. Also, they allow you manage connection information in a single place.

Airflow also supply a bunch of them out-of-the-box so that you don't need to write code for basic operations. If you wanted to use the s3/postgres/mysql clients directly you can still do this with hooks by using a hook's get_conn method.

There is no real reason not to use hooks, especially because it is extremely easy to create your own which can extend the provided ones.

Etam answered 22/9, 2023 at 9:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.