I'm working on a Hadoop cluster (HDP) with Hadoop 3. Spark and Hive are also installed.
Since Spark and Hive catalogs are separated, it's a bit confusing sometimes, to know how and where to save data in a Spark application.
I know, that the property spark.sql.catalogImplementation
can be set to either in-memory
(to use a Spark session-based catalog) or hive
(using Hive catalog for persistent metadata storing -> but the metadata is still separated from the Hive DBs and tables).
I'm wondering what the property metastore.catalog.default
does. When I set this to hive
I can see my Hive tables, but since the tables are stored in the /warehouse/tablespace/managed/hive
directory in HDFS, my user has no access to this directory (because hive is of course the owner).
So, why should I set the metastore.catalog.default = hive
, if I can't access the tables from Spark? Does it have something to do with Hortonwork's Hive Warehouse Connector?
Thank you for your help.