Apache Beam Coder for org.json.JSONObject
Asked Answered
S

1

5

I am writing a data pipeline in Apache Beam that reads from Pub/Sub, deserializes the message into JSONObjects and pass them to some other pipeline stages. The issue is, when I try to submit my code I get the following error:

An exception occured while executing the Java class. Unable to return a default Coder for Convert to JSON and obfuscate PII data/ParMultiDo(JSONifyAndObfuscate).output [PCollection]. Correct one of the following root causes: [ERROR] No Coder has been manually specified; you may do so using .setCoder(). [ERROR] Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder for org.json.JSONObject. [ERROR] Building a Coder using a registered CoderProvider failed. [ERROR] See suppressed exceptions for detailed failures. [ERROR] Using the default output Coder from the producing PTransform failed: PTransform.getOutputCoder called.

basically the error says Beam cannot find a Coder for org.json.JSONObject objects. I have no idea where to get such a coder or how to build one. Any ideas?

Thanks!

Seraphina answered 10/12, 2019 at 22:12 Comment(0)
U
6

The best starting point for understanding coders is in the Beam Programming Guide: Data Encoding and Type Safety. The short version is that Coders are used to specify how different types of data are encoded to and from byte strings at certain points in a Beam pipeline (usually at stage boundaries). Unfortunately there is no coder for JSONObjects by default, so you have two options here:

  1. Avoid creating JSONObjects in PCollections. Instead of passing JSONObjects throughout your pipeline, you could extract desired data from the JSON and either pass it around as basic data types, or have your own class encapsulating the data you need. Java's basic data types all have default coders assigned, and coders can easily be generated for classes that are just structs of those types. As a side benefit, this is how Beam pipelines are expected to be built, so it's likely to work more optimally if you stick with basic data and well-known coders when possible.

  2. If JSONObjects are necessary, you'll want to create a custom coder for them. The programming guide contains info for how to set a custom coder as a default coder. For the implementation itself, the easiest way with JSONObject is to encode it to a JSON string with JSONObject.toString and then decode it from the string with JSONObject's string constructor. For details on how to do this, check out the programming guide above and take a look at the Coder documentation.

Unsegregated answered 11/12, 2019 at 1:1 Comment(3)
Thanks! I created a custom coder class by extending the Coder class and implementing both methods "encode" and "decode" by basically converting the JSON Object to/from String.Seraphina
@Seraphina could you provide a JSON coder example?Ardennes
I used this small code for test purposes goonlinetools.com/snapshot/code/#9pf99jx17na8bkr72up2roBobseine

© 2022 - 2024 — McMap. All rights reserved.