Pyspark Streaming - How to set up custom logging?
Asked Answered
A

0

7

I have a pyspark streaming application that runs on yarn in a Hadoop cluster. The streaming application reads from a Kafka queue every n seconds and makes a REST call.

I have a logging service in place to provide an easy way to collect and store data, send data to Logstash and visualize data in Kibana. The data needs to conform to a template (JSON with specific keys) provided by this service.

I want to send logs from the streaming application to Logstash using this service. For this, I need to do two things:

- Collect some data while the streaming app is reading from Kafka and making the REST call. 
- Format it according to the logging service template.
- Forward the log to logstash host.

Any guidance related to this would be very helpful.

Thanks!

Afreet answered 13/4, 2017 at 21:2 Comment(2)
What is the logging framework are you using? is it the one of Python? and do you manage to log from the driver and all the executors ? Another important question, what is the Master of your spark application? is it Local, Yarn, Mesos or Standalone?Nikola
@Nikola I am trying to use the logging module from Python. The master of my spark application is Yarn and I want to run this job in a cluster mode.Afreet

© 2022 - 2024 — McMap. All rights reserved.