Way to visualize Beam pipeline run with DirectRunner
Asked Answered
R

2

5

In GCP we can see the pipeline execution graph. Is the same possible when running locally via DirectRunner?

Roughhew answered 12/6, 2022 at 14:9 Comment(0)
P
8

You can use pipeline_graph and the InteractiveRunner to get a graphviz representation of your pipeline locally.

An example for the word count pipeline used in the Beam documentation:

import apache_beam as beam
from apache_beam.runners.interactive.display import pipeline_graph
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
import re

pipeline = beam.Pipeline(InteractiveRunner())
lines = pipeline | beam.Create([f"some_file_{i}.txt" for i in range(10)])

# Count the occurrences of each word.
counts = (
    lines
    | 'Split' >> (
        beam.FlatMap(
            lambda x: re.findall(r'[A-Za-z\']+', x)).with_output_types(str))
    | 'PairWithOne' >> beam.Map(lambda x: (x, 1))
    | 'GroupAndSum' >> beam.CombinePerKey(sum))

# Format the counts into a PCollection of strings.
def format_result(word_count):
    (word, count) = word_count
    return f'{word}: {count}'

output = counts | 'Format' >> beam.Map(format_result)

# Write the output using a "Write" transform that has side effects.
# pylint: disable=expression-not-assigned
output | beam.io.WriteToText("some_file.txt")

print(pipeline_graph.PipelineGraph(pipeline).get_dot())

This prints

digraph G {
node [color=blue, fontcolor=blue, shape=box];
"Create";
lines [shape=circle];
"Split";
pcoll4978 [label="", shape=circle];
"PairWithOne";
pcoll8859 [label="", shape=circle];
"GroupAndSum";
counts [shape=circle];
"Format";
output [shape=circle];
"WriteToText";
pcoll6409 [label="", shape=circle];
"Create" -> lines;
lines -> "Split";
"Split" -> pcoll4978;
pcoll4978 -> "PairWithOne";
"PairWithOne" -> pcoll8859;
pcoll8859 -> "GroupAndSum";
"GroupAndSum" -> counts;
counts -> "Format";
"Format" -> output;
output -> "WriteToText";
"WriteToText" -> pcoll6409;
}

Putting this into https://edotor.net results in:

beam pipeline

You can work with GraphViz in Python to produce a prettier output if needed (graphviz for example).

Perren answered 12/6, 2022 at 18:37 Comment(0)
P
2

You can also use Python's RenderRunner, e.g.

python -m apache_beam.examples.wordcount --output out.txt \
    --runner=apache_beam.runners.render.RenderRunner \
    --render_output=pipeline.svg

This also has an interactive mode, triggered by passing --port=N (where 0 can be used to pick an unused port) which vends the graph as a local web service. This allows one to expand/collapse composites for easier exploration. Any --render_output arguments that are passed will get re-rendered as you edit the graph. (It uses graphviz under the hood, so can render any of those supported formats.)

Rendered Graph

For rendering non-Python pipelines, one can start this up as a local portable "runner."

python -m apache_beam.runners.render

and then "submit" this job from your other SDK over the provided jobs API endpoint via a portable runner to view it.

Pellucid answered 1/9, 2023 at 16:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.