How to use Apache Avro to serialize the JSON document and then write it into Cassandra?
Asked Answered
H

3

7

I have been reading a lot about Apache Avro these days and I am more inclined towards using it instead of using JSON. Currently, what we are doing is, we are serializing the JSON document using Jackson and then writing that serialize JSON document into Cassandra for each row key/user id. Then we have a REST service that reads the whole JSON document using the row key and then deserialize it and use it further.

We will write into Cassandra like this-

user-id   column-name   serialize-json-document-value

Below is an example which shows the JSON document that we are writing into Cassandra. This JSON document is for particular row key/user id.

{
  "lv" : [ {
    "v" : {
      "site-id" : 0,
      "categories" : {
        "321" : {
          "price_score" : "0.2",
          "confidence_score" : "0.5"
        },
        "123" : {
          "price_score" : "0.4",
          "confidence_score" : "0.2"
        }
      },
      "price-score" : 0.5,
      "confidence-score" : 0.2
    }
  } ],
  "lmd" : 1379214255197
}

Now we are thinking to use Apache Avro so that we can compact this JSON document by serializing with Apache Avro and then store it in Cassandra. I have couple of questions on this-

  1. Is it possible to serialize the above JSON document using Apache Avro first of all and then write it into Cassandra? If yes, how can I do that? Can anyone provide a simple example?
  2. And also we need to deserialize it as well while reading back from Cassandra from our REST service. Is this also possible to do?

Below is my simple code which is serializing the JSON document and printing it out on the console.

public static void main(String[] args) {

    final long lmd = System.currentTimeMillis();

    Map<String, Object> props = new HashMap<String, Object>();
    props.put("site-id", 0);
    props.put("price-score", 0.5);
    props.put("confidence-score", 0.2);

    Map<String, Category> categories = new HashMap<String, Category>();
    categories.put("123", new Category("0.4", "0.2"));
    categories.put("321", new Category("0.2", "0.5"));
    props.put("categories", categories);

    AttributeValue av = new AttributeValue();
    av.setProperties(props);

    Attribute attr = new Attribute();
    attr.instantiateNewListValue();
    attr.getListValue().add(av);
    attr.setLastModifiedDate(lmd);

    // serialize it
    try {
        String jsonStr = JsonMapperFactory.get().writeValueAsString(attr);

        // then write into Cassandra
        System.out.println(jsonStr);
    } catch (JsonGenerationException e) {
        e.printStackTrace();
    } catch (JsonMappingException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Serialzie JSON document will look something like this -

{"lv":[{"v":{"site-id":0,"categories":{"321":{"price_score":"0.2","confidence_score":"0.5"},"123":{"price_score":"0.4","confidence_score":"0.2"}},"price-score":0.5,"confidence-score":0.2}}],"lmd":1379214255197}

AttributeValue and Attribute class are using Jackson Annotations.

And also one important note, properties inside the above json document will get changed depending on the column names. We have different properties for different column names. Some column names will have two properties, some will have 5 properties. So the above JSON document will have its correct properties and its value according to our metadata that we are having.

I hope the question is clear enough. Can anyone provide a simple example for this how can I achieve that using Apache Avro. I am just starting with Apache Avro so I am having lot of problems..

Hendiadys answered 15/9, 2013 at 3:26 Comment(1)
I have the same question. Have you resolved it and how? thanks!Spondaic
D
1

Avro requires a schema, so you MUST design it before using it; and usage differs a lot from free-formed JSON.

But instead of Avro, you might want to consider Smile -- a one-to-one binary serialization of JSON, designed for use cases where you may want to go back and forth between JSON and binary data; for example, to use JSON for debugging, or when serving Javascript clients.

Jackson has Smile backend (see https://github.com/FasterXML/jackson-dataformat-smile) and it is literally a one-line change to use Smile instead of (or in addition to) JSON. Many projects use it (for example, Elastic Search), and it is mature and stable format; and tooling support via Jackson is extensive for different datatypes.

Defeat answered 17/9, 2013 at 3:57 Comment(0)
M
1

Since you already use jackson, you could try the Jackson dataformat module to support Avro-encoded data.

Mcripley answered 25/6, 2015 at 13:29 Comment(1)
I believe that man has to configure something. From my usage, I have not been able to serialize and then deserialize json using this dataformat. It certainly gave me better result but not out of box experience.Averse
P
1

I have tried to use ObjectMapper and Gson but it has not worked for me well in some cases, So I have used DatumWriter and DatumReader for this:

public static <T extends GenericRecord> String convertAvroObjectToJsonString(T event) throws IOException {
    try {
        DatumWriter<T> writer = new SpecificDatumWriter<>(event.getSchema());
        OutputStream out = new ByteArrayOutputStream();
        JsonEncoder encoder = EncoderFactory.get()
                .jsonEncoder(event.getSchema(), out);
        writer.write(event, encoder);
        encoder.flush();
        return out.toString();
    } catch (IOException e) {
        log.error("IOException occurred.", e);
        throw e;
    }
}

public static <T extends GenericRecord> T convertStringToAvro(String content, Schema schema) throws IOException {
    try {
        DatumReader<T> reader = new SpecificDatumReader<>(schema);
        JsonDecoder encoder = DecoderFactory.get()
                .jsonDecoder(schema, content);
        return reader.read(null, encoder);
    } catch (IOException e) {
        log.error("IOException occurred.", e);
        throw e;
    }
}

To get schema for deserializing, I simly call MyObjectAvroClass.getClassSchema();

Padre answered 8/7, 2024 at 18:55 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.