Difference between job, application, task, task attempt logs in Hadoop, Oozie
Asked Answered
S

1

10

I'm running an Oozie job with multiple actions and there's a part I could not make it work. In the process of troubleshooting I'm overwhelmed with lots of logs.

In YARN UI (yarn.resourceman­ager.webapp.address in yarn-site.xml, normally on port 8088), there's the application_<app_id> logs.

In Job History Server (yarn.log.server.url in yarn-site.xml, ours on port 19888), there's the job_<job_id> logs. (These job logs should also show up on Hue's Job Browser, right?)

In Hue's Oozie workflow editor, there's the task and task_attempt (not sure if they're the same, everything's a mixed-up soup to me already), which redirects to the Job Browser if you clicked here and there.

Can someone explain what's the difference between these things from Hadoop/Oozie architectural standpoint?

P.S. I've seen in logs container_<container_id> as well. Might as well include this in your explanation in relation to the things above.

Smallman answered 2/2, 2016 at 6:7 Comment(0)
K
22

In terms of YARN, the programs that are being run on a cluster are called applications. In terms of MapReduce they are called jobs. So, if you are running MapReduce on YARN, job and application are the same thing (if you take a close look, job ids and application ids are the same).

MapReduce job consists of several tasks (they could be either map or reduce tasks). If a task fails, it is launched again on another node. Those are task attempts.

Container is a YARN term. This is a unit of resource allocation. For example, MapReduce task would be run in a single container.

Kovacev answered 2/2, 2016 at 12:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.