Is there a way to have the output from Dataproc Spark jobs sent to Google Cloud logging? As explained in the Dataproc docs the output from the job driver (the master for a Spark job) is available under Dataproc->Jobs in the console. There are two reasons I would like to have the logs in Cloud Logging as well:
- I'd like to see the logs from the executors. Often the master log will says "executor lost" with no further detail, and it would be very useful to have some more information about what the executor is up to.
- Cloud Logging has nice filtering and search
Currently the only output from Dataproc that shows up in Cloud Logging is log items from yarn-yarn-nodemanager-* and container_*.stderr. Output from my application code is shown in Dataproc->Jobs but not in Cloud Logging, and it's only the output from the Spark master, not the executors.
print(..)
statements in my pyspark executors and Im not able to see their output anywhere. I can seeprint
output from the master but any output from inside mymap
function seems to be lost. – Jovitajovitah