How to configure Flume to listen a web api http petitions
Asked Answered
F

1

12

I have built an api web application, which is published on IIS Server, I am trying to configure Apache Flume to listen that web api and to save the response of http petitions in HDFS, this is the post method that I need to listen:

    [HttpPost]
    public IEnumerable<Data> obtenerValores(arguments arg)
    {
        Random rdm = new Random();

        int ano = arg.ano;
        int rdmInt;
        decimal rdmDecimal;

        int anoActual = DateTime.Now.Year;
        int mesActual = DateTime.Now.Month;

        List<Data> ano_mes_sales = new List<Data>();

        while (ano <= anoActual)
        {
            int mes = 1;
            while ((anoActual == ano && mes <= mesActual) || (ano < anoActual && mes <= 12))
            {
                rdmInt = rdm.Next();
                rdmDecimal = (decimal)rdm.NextDouble();
                Data anoMesSales = new Data(ano, mes,(rdmInt * rdmDecimal));
                ano_mes_sales.Add(anoMesSales);

                mes++;
            }
            ano++;
        }
        return ano_mes_sales;
    }

Flume is running over a VMware Virtual Machine CentOs, this is my attempt to configure flume to listen that application:

# Sources, channels, and sinks are defined per # agent name, in this case 'tier1'.
a1.sources  = source1
a1.channels = channel1
a1.sinks    = sink1
a1.sources.source1.interceptors = i1 i2 
a1.sources.source1.interceptors.i1.type = host
a1.sources.source1.interceptors.i1.preserveExisting = false
a1.sources.source1.interceptors.i1.hostHeader = host
a1.sources.source1.interceptors.i2.type = timestamp

# For each source, channel, and sink, set # standard properties.
a1.sources.source1.type     = org.apache.flume.source.http.HTTPSource
a1.sources.source1.bind     = transacciones.misionempresarial.com/CSharpFlume
a1.sources.source1.port     = 80

# JSONHandler is the default for the httpsource # 
a1.sources.source1.handler = org.apache.flume.source.http.JSONHandler
a1.sources.source1.channels = channel1
a1.channels.channel1.type   = memory
a1.sinks.sink1.type         = hdfs
a1.sinks.sink1.hdfs.path = /monthSales
a1.sinks.sink1.hdfs.filePrefix = event-file-prefix-
a1.sinks.sink1.hdfs.round = false
a1.sinks.sink1.channel      = channel1

# Other properties are specific to each type of # source, channel, or sink. In this case, we # specify the capacity of the memory channel.
a1.channels.channel1.capacity = 1000 

I am using curl to post, here is my attempt:

curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '[{"ano":"2010"}]' http://transacciones.misionempresarial.com/CSharpFlume/api/SourceFlume/ObtenerValores

I only get this error:

{"Message":"Error."}

My question are, which is the right way to configure flume to listen http petitions to my web api, what I am missing?

Feliks answered 3/10, 2017 at 14:30 Comment(1)
have you looked at Flume logs? would be helpful to post here. I haven't used http source for Flume, but can suggest to use Kafka REST API github.com/confluentinc/kafka-rest and Flume Kafka source instead of dropping HTTP messages directly to Flume. If you're flexible to change your architecture even further, replace Kafka with Spark Steaming that writes output stream over to HDFS.Podagra
L
0

The standard Flume 'HTTPSource', and its default JSONHandler, will only process an event in a specific, Flume-centric format.

That format is documented in the user manual, and also in the comments at the beginning of the JSONHandler source code.

In summary, it expects to receive a list of JSON objects, each one containing headers (key/value pairs, mapped to the Flume Event headers) and body (a simple string, mapped to the Flume Event body).

To take your example, if you send:

[{"headers": {}, "body": "{\"ano\":\"2010\"}"}]

I think you'd get what you were looking for.

If you don't have the flexibility to change what you send, then you may be able to use org.apache.flume.source.http.BLOBHandler, depending upon what processing you are trying to do (NB. there's no documentation in the manual for this, only for org.apache.flume.sink.solr.morphline.BlobHandler - they are not the same thing, but there are some notes in FLUME-2718), or you may need to provide your own implementation of Flume's HTTPSourceHandler interface instead.

Side note: the HTTP Source bind option requires a hostname or IP address. You may just be being lucky with your value being treated as the hostname, and the path being ignored.

Liebknecht answered 13/5, 2019 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.