I have an external table mapped in Hive (v2.3.2 on EMR-5.11.0) that I need to update with new data around once a week. The merge consists of a conditional upsert statement.
The table's location is in s3, and the data is always there (created once, and we just need to update it with new data).
I've read this blog about merging data in Hive using the ACID feature on transactional tables (https://dzone.com/articles/update-hive-tables-the-easy-way-part-2-hortonworks), but as far as I can see, the only solution is to copy my external table to a temporary Hive internal table, that is clustered and transactional, then only on that table I can do the merge and override my original data with the new merged one.
This table is quite big (around 10GB of data), so I'd like to avoid copying it before every merge operation.
Is there a way to create the internal table and have it mapped to existing data? or is there another way, besides the merge statement, to perform an upsert on Hive external tables?
Thanks a lot in advance!