How to view the logs of a spark job after it has completed and the context is closed?
Asked Answered
H

1

5

I am running pyspark, spark 1.3, standalone mode, client mode.

I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the configuration settings under which the jobs were submitted, etc. But I'm running into trouble viewing the logs of jobs after the context is closed.

When I submit a job, of course I open a spark context. While the job is running, I'm able to open the spark web UI using ssh tunneling. And, I can access the forwarded port by localhost:<port no>. Then I can view the jobs currently running, and the ones that are completed, like this:

spark web ui example

Then, if I wish to see the logs of a particular job, I can do so by using ssh tunnel port forwarding to see the logs on a particular port for a particular machine for that job.

Then, sometimes the job fails, but the context is still open. When this happens, I am still able to see the logs by the above method.

But, since I don't want to have all of these contexts open at once, when the job fails, I close the context. When I close the context, the job appears under "Completed Applications" in the image above. Now, when I try to view the logs by using ssh tunnel port forwarding, as before (localhost:<port no>), it gives me a page not found.

How do I view the logs of a job after the context is closed? And, what does this imply about the relationship between the spark context and where the logs are kept? Thank you.

Again, I am running pyspark, spark 1.3, standalone mode, client mode.

Hedvah answered 15/7, 2016 at 21:48 Comment(0)
M
11

Spark event log / history-server is for this use case.

Enable event log

If conf/spark-default.conf does not exist

cp conf/spark-defaults.conf.template conf/spark-defaults.conf

add the following configuration to conf/spark-default.conf.

# This is to enabled event log
spark.eventLog.enabled  true

// this is where to store event log
spark.eventLog.dir file:///Users/rockieyang/git/spark/spark-events

// this is tell history server where to get event log
spark.history.fs.logDirectory file:///Users/rockieyang/git/spark/spark-events

History server

start history server

sbin/start-history-server.sh 

check history, by default the port is 18080

http://localhost:18080/

Morisco answered 16/7, 2016 at 3:37 Comment(4)
This is nice, thank you. So, when the spark context is closed, why is it that I can't view it in the webUI? Is it because it shuts down the traffic on that port when the context is closed?Hedvah
Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. When the SparkContext shutdown, the web UI does not exist any more.Morisco
but, I thought that as along as one application is still running, port 4040 should still be open?Hedvah
Spark is working pretty stateless. You close the context, then the application is closed. It will be a new application when you open it again.Morisco

© 2022 - 2024 — McMap. All rights reserved.