Does DROP PARTITION delete data from external table in HIVE?
Asked Answered
L

2

6

An external table in HIVE is partitioned on year, month and day.

So does the following query delete data from external table for the specific partitioned referenced in this query?:-

ALTER TABLE MyTable DROP IF EXISTS PARTITION(year=2016,month=7,day=11);
Loux answered 12/7, 2016 at 1:11 Comment(0)
R
7

Partitioning scheme is not data. Partitioning scheme is part of table DDL stored in metadata (simply saying: partition key value + location where the data-files are being stored).

Data itself are stored in files in the partition location(folder). If you drop partition of external table, the location remain untouched, but unmounted as partition (metadata about this partition is deleted). You can have few versions of partition location unmounted (for example previous versions).

You can drop partition and mount another location as partition (alter table add partition) or change existing partition location. Also drop external table do not delete table/partitions folders with files in it. And later you can create table on top of this location.

Have a look at this answer for better understanding external table/partition concept: It is possible to create many tables (both managed and external at the same time) on top of the same location in HDFS.

Ruddy answered 22/7, 2016 at 18:8 Comment(0)
K
3

No external table have only references that will be deleted actual file will still persists at location .

External Table data files are not owned by table neither moved to hive warehouse directory

Only PARTITION meta will be deleted from hive metastore tables..

Difference between Internal & external tables :

For External Tables -

External table stores files on the HDFS server but tables are not linked to the source file completely.

If you delete an external table the file still remains on the HDFS server.

As an example if you create an external table called “table_test” in HIVE using HIVE-QL and link the table to file “file”, then deleting “table_test” from HIVE will not delete “file” from HDFS.

External table files are accessible to anyone who has access to HDFS file structure and therefore security needs to be managed at the HDFS file/folder level.

Meta data is maintained on master node and deleting an external table from HIVE, only deletes the metadata not the data/file.

For Internal Tables-

Stored in a directory based on settings in hive.metastore.warehouse.dir, by default internal tables are stored in the following directory “/user/hive/warehouse” you can change it by updating the location in the config file . Deleting the table deletes the metadata & data from master-node and HDFS respectively. Internal table file security is controlled solely via HIVE. Security needs to be managed within HIVE, probably at the schema level (depends on organisation to organisation).

Hive may have internal or external tables this is a choice that affects how data is loaded, controlled, and managed.

Use EXTERNAL tables when:

The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files. Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas. Hive should not own data and control settings, dirs, etc., you may have another program or process that will do those things. You are not creating table based on existing table (AS SELECT).

Use INTERNAL tables when:

The data is temporary. You want Hive to completely manage the life-cycle of the table and data.

Note: Meta table if you will look in to the database ( configured details)

|BUCKETING_COLS      |
| COLUMNS            |
| DBS                |
| NUCLEUS_TABLES     |
| PARTITIONS         |
| PARTITION_KEYS     |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS   |
| SDS                |
| SD_PARAMS          |
| SEQUENCE_TABLE     |
| SERDES             |
| SERDE_PARAMS       |
| SORT_COLS          |
| TABLE_PARAMS       |
| TBLS               | 
Keratoplasty answered 12/7, 2016 at 6:30 Comment(6)
So what is the exact result of this command then? It is clearly not deleting reference to the table, it is dropping partition, but that too only specific partition, it's little confusing as to what is really happening.Loux
For that one could have simply said ALTER TABLE MyTable DROP IF EXISTS PARTITION(year,month,day); that would have deleted partition metadata -- why supply values for a specific partition?Loux
I am sorry but that doesn't explain. As per what you said earlier it would only delete metadata. Now in second reply you say that it needs meta information for finding data. If data is not going to be deleted, why would it need to find data? As long as partitioning deletion is concerned , I don't think it is possible only to delete partitioning scheme associated with just one specific partition. Partitioning scheme is for the entire table, so if we are talking about disabling partition altogether, it would have to be done at entire table.Loux
I have added some more detailsKeratoplasty
Thanks for your efforts, really appreciate it. But this is not what I was asking. I know exactly what is internal table and what is external table. My point was -- how is it that partitioning scheme is only turned off for one particular partition, how is it possible?Loux
when you are going to execute any sql on hive it is going to read details from meta table mentioned in answer and accordingly it will go and find from where (location and column name and from which PARTITIONS) query have to retrieve data . drop partition will delete data from PARTITIONS , PARTITION_KEYS, PARTITION_KEY_VALS, PARTITION_PARAMS table . Now if you fire query engine will not have details of the and you not get data. though data is present at external location..Keratoplasty

© 2022 - 2024 — McMap. All rights reserved.