Where are the Spark logs on EMR?
Asked Answered
M

6

22

I'm not able to locate error logs or message's from println calls in Scala while running jobs on Spark in EMR.

Where can I access these?

I'm submitting the Spark job, written in Scala to EMR using script-runner.jar with arguments --deploy-mode set to cluster and --master set to yarn. It runs the job fine.

However I do not see my println statements in the Amazon EMR UI where it lists "stderr, stdoutetc. Furthermore if my job errors I don't see why it had an error. All I see is this in thestderr`:

15/05/27 20:24:44 INFO yarn.Client: Application report from ResourceManager: 
 application identifier: application_1432754139536_0002
 appId: 2
 clientToAMToken: null
 appDiagnostics: 
 appMasterHost: ip-10-185-87-217.ec2.internal
 appQueue: default
 appMasterRpcPort: 0
 appStartTime: 1432758272973
 yarnAppState: FINISHED
 distributedFinalState: FAILED
 appTrackingUrl: http://10.150.67.62:9046/proxy/application_1432754139536_0002/A
 appUser: hadoop

`

Metastasis answered 27/5, 2015 at 23:38 Comment(0)
A
16

With the deploy mode of cluster on yarn the Spark driver and hence the user code executed will be within the Application Master container. It sounds like you had EMR debugging enabled on the cluster so logs should have also pushed to S3. In the S3 location look at task-attempts/<applicationid>/<firstcontainer>/*.

Aspectual answered 29/5, 2015 at 5:50 Comment(3)
Yes, this is correct. Thank you. For other users knowledge, you can see this "Log URI" in the Amazon EMR Web UI for your cluster info/details.Metastasis
In my case, I cannot found "task-attempts" when clicking "Log URI" in the EMR Web UI, was it renamed or moved?Upandcoming
The path to the location for the logs has been changed. Please refer to answer by @Vsh - https://mcmap.net/q/577378/-where-are-the-spark-logs-on-emrShelter
A
7

I also spent a lot of time figuring this out. Found logs in the following location: EMR UI Console -> Summary -> Log URI -> Containers -> application_xxx_xxx -> container_yyy_yy_yy -> stdout.gz.

Agriculturist answered 18/7, 2020 at 2:5 Comment(1)
Best answer here!Eradis
P
6

If you SSH into the master node of your cluster then you should be able to find the stdout, stderr, syslog and controller logs under:

/mnt/var/log/hadoop/steps/<stepname>
Paleolith answered 28/5, 2015 at 3:54 Comment(2)
Those are the Step logs, which do not contain the Spark application logs (such as OP's println statements).Bodiless
As stated in the above comment, this answer is incorrect.Hage
J
1

The event logs, the ones required for the spark-history-server can be found at :

hdfs:///var/log/spark/apps
Jurywoman answered 29/1, 2019 at 12:16 Comment(0)
T
1

As you are using yarn, it is very easy to get the logs using yarn logs command.

Example usage:

yarn logs -applicationId applicationId -am 1 | grep "Your app log"

This will print logs from 1st container which usually is master.

Terpstra answered 3/5 at 19:16 Comment(0)
G
0

If you submit your job with emr-bootstrap you can specify the log directory as an s3 bucket with --log-uri

Glanti answered 28/5, 2015 at 1:20 Comment(2)
thanks - I think this might be set when I create the cluster (not when submitting a job)? i'll try next time I create the clusterMetastasis
The S3 logs are plain text, however in Spark History Server I can download JSON logs which are prefect for indexing in Elasticsearch (Download button under Event Log column). Where are these stored?Delicate

© 2022 - 2024 — McMap. All rights reserved.