Workload Identity & Service Accounts for Composer 2 / GKE Autopilot Cluster PodOperator tasks
N

1

5

I'm trying to run GKEStartPodOperator/KubernetesPodOperator tasks in a Composer 2 environment, which makes use of a GKE cluster in autopilot mode. We have an existing Composer 1 environment with a GKE cluster not in autopilot mode. Our tasks that authenticate with Google Cloud Platform services (BigQuery, GCS, etc), fail with 401 unauthorized in the Composer 2 environment, but succeed in the Composer 1 environment.

In the log files, I can tell that the tasks in both environments get their credentials via requests to the metadata server. The key difference is tasks in Composer 1 request the service account assigned to the node the task runs in, but the tasks in Composer 2 request what seems to be a workload identity pool like [project-name].svc.id.goog.

The logs from Composer 1 are:

[2021-10-22 12:38:01,349] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,351] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,352] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,359] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,374] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,392] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,395] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,398] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,412] {pod_launcher.py:148} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-22 12:38:01,414] {pod_launcher.py:148} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-22 12:38:01,415] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-22 12:38:01,437] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-22 12:38:01,452] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 226
[2021-10-22 12:38:01,454] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-id][email protected]/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-22 12:38:01,463] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-id][email protected]/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 1049
[2021-10-22 12:38:01,468] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-22 12:38:02,028] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-nam]/jobs?prettyPrint=false HTTP/1.1" 200 None

The logs from Composer 2 are:

[2021-10-21 13:56:06,619] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,621] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,624] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,634] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,641] {pod_launcher.py:149} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-21 13:56:06,714] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-21 13:56:06,720] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 121
[2021-10-21 13:56:06,721] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-21 13:56:06,831] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 765
[2021-10-21 13:56:06,833] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-21 13:56:06,866] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-name]/jobs?prettyPrint=false HTTP/1.1" 401 None

Based on Workload Identity documentation, I would guess I need to bind a specific service account to the node/node-pool running the pod, but I'm not sure how to do that with Composer 2 GKE Autopilot since nodes are managed for me. Composer 2 does not currently have documentation available on using KubernetesPodOperator or GKEStartPodOperator.

In summary, my question is: How should I configure my Composer 2 environment PodOperator tasks to utilize a specific service account to authenticate with GCP services?

Naivete answered 22/10, 2021 at 13:32 Comment(0)
N
7

I received some guidance from operations engineers, and now have a KubernetesPodOperator task successfully authenticating with GCP services via a service account. I'll share the steps and helpful bits of info below.

First, follow the steps for Authenticating to Google Cloud using Workload Identity. I thought Composer 2 configured the kubernetes <> google cloud service account binding and annotation for me, but that wasn't the case. I had to create the namespace, kubernetes service account, binding for the ksa and gsa, and the annotation for the KSA just like the instructions say.

Second, I had to update my KubernetesPodOperator instance with the parameters namespace and service_account_name set to the namespace and kubernetes service account I made in the first step.

An upload of the DAG and a task execution later, and I can confirm that these two steps enabled my task to request the bound Google Service Account, and from there the google client library authentication succeeded in my test against BigQuery.

Naivete answered 22/10, 2021 at 17:51 Comment(1)
did you used "from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator" or "airflow.providers.cncf.kubernetes.operators.kubernetes_pod"? and what was your airflow version? I use airflow 2.2.5 and "airflow.contrib.operators.kubernetes_pod_operator" does it matter?Twedy

© 2022 - 2024 — McMap. All rights reserved.