First off, can I just say that I am learning DataBricks at the time of writing this post, so I'd like simpler, cruder solutions as well as more sophisticated ones.
I am reading a CSV file like this:
df1 = spark.read.format("csv").option("header", True).load(path_to_csv_file)
Then I'm saving it as a Delta Live Table like this:
df1.write.format("delta").save("table_path")
The CSV headers have characters in them like space and &
and /
, and I get the error:
AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema. Please enable column mapping by setting table property 'delta.columnMapping.mode' to 'name'. For more details, refer to https://docs.databricks.com/delta/delta-column-mapping.html Or you can use alias to rename it.
The documentation I've seen on the issue explains how to set the column mapping mode to 'name' AFTER a table has been created using ALTER TABLE
, but does not explain how to set it at creation time, especially when using the DataFrame API as above. Is there a way to do this?
Is there a better way to get CSV into a new table?
UPDATE:
Reading the docs here and here, and inspired by Robert's answer, I tried this first:
spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")
Still no luck, I get the same error. It's interesting how hard it is for a beginner to write a CSV file with spaces in its headers to a Delta Live Table