I have some data that is processed and model based on case class
es, and the classes can also have other case class
es in them, so the final table has complex data, struct
, array
. Using the case class I save the data in hive using dataframe.saveAsTextFile(path)
.
This data sometimes changes or needs to have a different model, so for each iteration I use a suffix in the table name (some_data_v01, some_data_v03, etc.).
I also have queries that are run on a schedule on these tables, using Impala, so in order to not modify the query each time I save a a new table, I wanted to use a view that is always updated whenever I change the model.
The problem with that is I can't use Impala to create the view
, because of the complex nature of the data in the tables (nested complex types). Apart from being a lot of work to expand the complex types, I want these types to be preserved (lots of level of nesting, duplication of data when joining arrays).
One solution was to create the view using Hive, like this
create view some_data as select * from some_data_v01;
But if I do this, when I want to use the table from Impala,
select * from some_data;
or even something simple, like
select some_value_not_nested, struct_type.some_int, struct_type.some_other_int from some_data;
the error is the following:
AnalysisException: Expr 'some_data_v01.struct_type' in select list returns a complex type 'STRUCT< some_int:INT, some_other_int:INT, nested_struct:STRUCT< nested_int:INT, nested_other_int:INT>, last_int:INT>'. Only scalar types are allowed in the select list.
Is there any way to access this view, or create it in some other way for it to work?