I have my databricks python code in github
. I setup a basic workflow to lint the python code using flake8
. This fails because the names that are implicitly available to my script (like spark
, sc
, dbutils
, getArgument
etc) when it runs on databricks are not available when flake8
lints it outside databricks (in github ubuntu vm).
How can I lint databricks notebooks in github
using flake8
?
E.g. errors I get:
test.py:1:1: F821 undefined name 'dbutils'
test.py:3:11: F821 undefined name 'getArgument'
test.py:5:1: F821 undefined name 'dbutils'
test.py:7:11: F821 undefined name 'spark'
my notebook in github:
dbutils.widgets.text("my_jdbcurl", "default my_jdbcurl")
jdbcurl = getArgument("my_jdbcurl")
dbutils.fs.ls(".")
df_node = spark.read.format("jdbc")\
.option("driver", "org.mariadb.jdbc.Driver")\
.option("url", jdbcurl)\
.option("dbtable", "my_table")\
.option("user", "my_username")\
.option("password", "my_pswd")\
.load()
my .github/workflows/lint.yml
on:
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v1
with:
python-version: 3.8
- run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8
, including what dependencies it provides. That will tell you how you should invokeflake8
in GitHub Actions.. – Unswearflake8
not databricks. – Floriated