I have created an AWS EMR cluster and notebook using default settings.
When I open the notebook, the kernel won't launch. I get the message "Workspace is not attached to cluster".
- The cluster is in a "Ready" state.
- None of the kernels work (Python, Spark, PySPark).
- The error occurs using both Jupyter Labs or Jupyter.
- I switched to a different AWS account where I had never run EMR and created a notebook. I requested that a cluster be created. AWS launched a cluster, but gave the same error when I launched a notebook.
A clue
I looked at the log files created by a cluster where the notebook failed.
In the log file https://aws-logs-***.s3.amazonaws.com/elasticmapreduce/j-3SOK08VFSQDPO/node/i-04af0a3d2d6d96cac/daemons/emr-on-cluster-env/gateway.log.gz
, I found the following:
Jupyter Enterprise Gateway 2.1.0 is available at http://127.0.0.1:9547
User 'root' is not authorized to start kernel 'Python 3'. Ensure KERNEL_USERNAME is set to an appropriate value and retry the request.
User 'root' is not authorized to start kernel 'PySpark'. Ensure KERNEL_USERNAME is set to an appropriate value and retry the request.
How I got the notebook kernel to work
Per the Stackoverflow post Notebooks on EMR (AWS): Failed to start kernel, I switched from using the root AWS account, to an IAM user. This worked with EMR 6.5.0.
My question
What changed when I launched the cluster with an IAM account? How could I have figured out that using the root user is the problem?
EMR is a black box to me. Thanks in advance for helping me understand the inner workings of this amazing technology.