RabbitMQ queue messages before writing to MongoDb
Asked Answered
P

2

7

Application is sending logs from many machines to Amazon Cloud and store them in some database.

> Lets assume: one machine log size: 1kB every 10 seconds, num of machines from
1000 to 5000

My first approach was to queue logs in rabbitmq and then rabbitmq consumer would store them in sql database.

  1. Do I really need rabbitmq when consumer only do some basic storage operation?

Second approach was to queue logs in rabbitmq but store them in mongodb

  1. Is this make sense to queue messages before write to mongodb?
Platino answered 11/4, 2015 at 8:30 Comment(0)
H
8

Since you already have multiple producer systems creating logs, you already have a distributed architecture.

There are many benefits to decoupling a utility / cross cutting concern like logging from each of the systems, and instead using a queue:

  • By using an asynchronous approach, you will be able to buffer spikes of high volumes of messages in Rabbit, without impacting the throughput of the producer systems. Also, the centralized Log writing system may be able to batch the log inserts - bulk log message writes will require fewer database connections and can optimize IO beyond that which would be possible by a large number of servers each writing small numbers of logs directly.
  • It centralizes the concern of log writing. This way, you do not need to maintain the code to write logs on each producer, e.g. if the log format or the log storage changes (it already seems you have doubts on whether to store logs in NoSql like Mongo or Sql). This will be especially useful if the producer machines use different tech stacks (e.g. Java, Node, .Net) or different versions of the JVM etc. (You do however need to write to the queue from each system)
  • It decouples the availability of the producing system from the logging service (e.g. if the service writing the log data to MongoDb is down, logs can be queued in Rabbit until the system becomes available again). Remember to stamp the message creation time on the originating server, however.
  • It frees up IO and CPU resources on the producer systems.
  • Rabbit can form the basis of a bus architecture. This will allow you to extend the number of consumers of log messages, e.g. for redundancy, or e.g. to implement metrics, without impacting on the existing implementation at all.
Humanize answered 11/4, 2015 at 8:42 Comment(1)
That said, many loggers allow for multiple sinks at the producer system. Logging to the file system in addition to, or prior to, sending logs to a centralized database is a good idea, just in case something goes wrong when sending log data across the network - i.e. like the black box in an airline industry, if the file system survives a traumatic hardware failure or such, you still have some data to assist with post mortems.Humanize
H
1

As stated by StuartLC, you need buffering and you need to decouples the availability of the producing system from the logging service.

Here is the cons against RabbitMQ:

  • RabbitMQ will be another point of failure to manage. If your logs are significant and/or have a high throughput you will have to make a cluster of RabbitMQ.
  • You will have to manage local buffering because RabbitMQ can be unavailable or because your producers are under flow control.
  • RabbitMQ does buffering but an healthy RabbitMQ is an empty one.

You do not define what you put under "log". As you state 1kB every 10 seconds, it seems to be metrics. Please correct me if I'm wrong.

Regarding logs handling, I tend to favor local buffering with a stack dedicated to logs handling: syslog, flume, logstash... Backed by a datastore with a high throughput. MongoDB should fit the need, I'm a bit skeptical about a RDBMS.

Whatever you may be able to implement local buffering with local RabbitMQ and federated queues.

Hepcat answered 11/4, 2015 at 19:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.