Google Cloud Data flow jobs failing with error 'Failed to retrieve staged files: failed to retrieve worker in 3 attempts: bad MD5...'
Asked Answered
K

2

6

SDK: Apache Beam SDK for Go 0.5.0

We are running Apache Beam Go SDK jobs in Google Cloud Data Flow. They had been working fine until recently when they intermittently stopped working (no changes made to code or config). The error that occurs is:

Failed to retrieve staged files: failed to retrieve worker in 3 attempts: bad MD5 for /var/opt/google/staged/worker: ..., want ; bad MD5 for /var/opt/google/staged/worker: ..., want ;

(Note: It seems as if it's missing a second hash value in the error message message.)

As best I can guess there's something wrong with the worker - It seems to be trying to compare md5 hashes of the worker and missing one of the values? I don't know exactly what it's comparing to though.

Does anybody know what could be causing this issue?

Kelson answered 17/12, 2018 at 22:7 Comment(2)
Some additional notes: The error has the path /var/opt/google/staged/worker but when I SSH into the VM the only path I can see is /var/opt/google/dataflow/staged/worker - The binary seems to match the expected size but I'm not sure why the paths are different?Kelson
I installed the version go1.12 and I executed the Beam Go SDK Quickstart; it worked. You error could be an issue with version Go 0.5.0 that now it's fixed, unless your program has a specific code that is causing it. By the way, Beam SDK for Go is not in the list of programming languages supported by Dataflow, only Java, Python and REST are supported.Unswerving
K
1

The fix to this issue seems to have been to rebuild the worker_harness_container_image with the latest changes. I had tried this but I didn't have the latest release when I built it locally. After I pulled the latest from the Beam repo, and rebuilt the image (As per the notes here https://github.com/apache/beam/blob/master/sdks/CONTAINERS.md) and reran it seemed to work again.

Kelson answered 19/12, 2018 at 3:51 Comment(0)
P
0

I'm seeing the same thing. If I look into the Stackdriver logging I see this:

Handler for GET /v1.27/images/apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515/json returned error: No such image: apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515

However, I can pull the image just fine locally. Any ideas why Dataflow cannot pull.

Petulancy answered 10/1, 2019 at 19:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.