Updating data in Clickhouse
Asked Answered
W

6

19

I went over the documentation for Clickhouse and I did not see the option to UPDATE nor DELETE. It seems to me its an append only system. Is there a possibility to update existing records or is there some workaround like truncating a partition that has records in it that have changed and then re-insering the entire data for that partition?

Watchmaker answered 18/6, 2016 at 21:7 Comment(0)
B
23

Through Alter query in clickhouse we can able to delete/update the rows in a table.

For delete: Query should be constructed as

ALTER TABLE testing.Employee DELETE  WHERE  Emp_Name='user4';

For Update: Query should be constructed as

ALTER TABLE testing.employee UPDATE AssignedUser='sunil' where AssignedUser='sunny';
Bushing answered 22/3, 2019 at 11:36 Comment(0)
L
19

UPDATE: This answer is no longer true, look at https://mcmap.net/q/629590/-updating-data-in-clickhouse

ClickHouse doesn't support real UPDATE/DELETE. But there are few possible workarounds:

  1. Trying to organize data in a way, that is need not to be updated. You could write log of update events to a table, and then calculate reports from that log. So, instead of updating existing records, you append new records to a table.

  2. Using table engine that do data transformation in background during merges. For example, (rather specific) CollapsingMergeTree table engine: https://clickhouse.yandex/reference_en.html#CollapsingMergeTree Also there are ReplacingMergeTree table engine (not documented yet, you could find example in tests: https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00325_replacing_merge_tree.sql) Drawback is that you don't know, when background merge will be done, and will it ever be done.

Also look at samdoj's answer.

Lael answered 20/6, 2016 at 4:35 Comment(1)
from 1.1.54388 release 2018-06-28 you can mutate data. clickhouse.yandex/docs/en/changelog/…Vetavetch
S
5

You can drop and create new tables, but depending on their size this might be very time consuming. You could do something like this:

For deletion, something like this could work.

  INSERT INTO tableTemp SELECT * from table1 WHERE  rowID != @targetRowID;
    DROP table1;
    INSERT INTO table1 SELECT * from tableTemp;

Similarly, to update a row, you could first delete it in this manner, and then add it.

Selfrespect answered 18/6, 2016 at 22:1 Comment(3)
But how would I update a very large table. For example, what if my process was to gather a lot of records, then "correct" 15% of them?Watchmaker
Basically just update and delete multiple rows? You could first select those rows into a temp table and just change the where clause to be WHERE rowID is not in (SELECT rowID FROM targetTable) You are only limited here by what you can select.Selfrespect
It would be much more efficient to rename table instead of doing INSERT INTO table1 SELECT * from tableTemp;Returnable
K
5

Functionality to UPDATE or DELETE data has been added in recent ClickHouse releases, but its expensive batch operation which can't be performed too frequently.

See https://clickhouse.yandex/docs/en/query_language/alter/#mutations for more details.

Kliman answered 22/11, 2018 at 18:37 Comment(0)
V
3

It's an old question, but updates are now supported in Clickhouse. Note it's not recommended to do many small changes for performance reasons. But it is possible.

Syntax:

ALTER TABLE [db.]table UPDATE column1 = expr1 [, ...] WHERE filter_expr

Clickhouse UPDATE documentation

Valdovinos answered 27/10, 2020 at 17:23 Comment(0)
F
0

To update values you can use the ReplacingMergeTree engine.

Having this engine on a table, if you try to append a row with the sort keys that already exists in the table, it will replace previous record that have the same sort keys (like upsert).

Finn answered 27/8, 2023 at 11:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.