How to avoid Keyerror named 'kernelspec' in Papermill?
Asked Answered
T

1

11

I am running a papermill command from withing airflow(docker). The script is stored on S3 and I run it using Python client of papermill. It ends up in an error which is not at all understandable:

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/ipython_genutils/ipstruct.py", line 132, in __getattr__
result = self[key]
KeyError: 'kernelspec'

I tried looking into the doc but in vain.

The code that I am using is to run the papermill command is:

import time
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from mypackage.datastore import db
from mypackage.workflow.transform.jupyter_notebook import run_jupyter_notebook


dag_id = "jupyter-test-dag"
default_args = {
    'owner': "aviral",
    'depends_on_past': False,
    'start_date': "2019-02-28T00:00:00",
    'email': "aviral@some_org.com",
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'provide_context': True
}

dag = DAG(
    dag_id,
    catchup=False,
    default_args=default_args,
    schedule_interval=None,
    max_active_runs=1
)


def print_context(ds, **kwargs):
    print(kwargs)
    print(ds)
    return 'Whatever you return gets printed in the logs'


def run_python_jupyter(**kwargs):
    run_jupyter_notebook(
        script_location=kwargs["script_location"]
    )


create_job_task = PythonOperator(
    task_id="create_job",
    python_callable=run_python_jupyter,
    dag=dag,
    op_kwargs={
            "script_location": "s3://some_bucket/python3_file_write.ipynb"
    }
)

globals()[dag_id] = dag

The function run_jupyter_notebook is:

def run_jupyter_notebook(**kwargs):
    """Runs Jupyter notebook"""
    script_location = kwargs.get('script_location', '')
    if not script_location:
        raise ValueError(
            "Script location was not provided."
        )
    pm.execute_notebook(script_location, script_location.split(
        '.ipynb')[0] + "_output" + ".ipynb")

I expect the code to run without any error as I have run this on local as well(not using the s3 paths, using the local filesystem paths)

Tridimensional answered 7/5, 2019 at 13:54 Comment(2)
I think this is related to the fact that the kernel name is written to the ipynb file. You probably have used a different kernel when you saved the ipynb file to when you are trying to execute the ipynb file.Cyndicyndia
any luck here? Although not exactly the same, i do get a similar error when running in Airflow jupyter_client.kernelspec.NoSuchKernel: No such kernel named python3Karee
M
3

Jupyter adds metadata to your notebook. Your error is related to the fact some metadata, under key kernelspec, are missing.

Example of the kernelspec object in notebook metadata:

"kernelspec": {
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
}

Thus, to solve your error you need to correct the notebook metadata to add a correct kernelspec object. The most simple way of doing this if to edit the notebook JSON document and add a kernelspec object in the metadata first level object.

"metadata": {
    "kernelspec": {
        "display_name": "Python 3",
        "language": "python",
        "name": "python3"
    },
    "language_info": {
        "codemirror_mode": {
            "name": "python",
            "version": 3
        }
    }
}

Your error might come from the fact you are using a cleaner to get read out of notebook outputs like nbstripout python package. If that's the case, take care changing nbstripout settings following the documentation.

Michale answered 9/10, 2020 at 9:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.