I have a df as a result of a sql query, rows are as follows:
UserID: <class 'uuid.UUID'> 6f6e526c-acd8-496f-88e0-6bfdd594e2c2
ObjectID: <class 'int'> 6095016
ObjectClass: <class 'str'> cottages
Views: <class 'list'> [datetime.datetime(1970, 1, 1, 0, 0)]
RecDate: <class 'pandas._libs.tslibs.timestamps.Timestamp'> 2021-07-13 15:50:32
Events: <class 'list'> ['']
And a clickhouse table:
CREATE TABLE default.rec_eval_data
(
`UserID` UUID,
`ObjectID` Int32,
`ObjectClass` String,
`Views` Array(DateTime),
`RecDate` DateTime,
`Events` Array(String)
)
ENGINE = ReplacingMergeTree
ORDER BY (UserID, ObjectID)
SETTINGS index_granularity = 8192
I'm trying different ways to insert the DF into the table, but i keep getting various errors.
I'm using clickhouse_driver library to perform stuff.
I have read This thread and used all the methods, but to no avail
What I have tried already:
Native:
client.execute("INSERT INTO rec_eval_data VALUES", data.to_dict(orient='records'), types_check=True)
Error: clickhouse_driver.errors.TypeMismatchError: Code: 53. Type mismatch in VALUES section. Repeat query with types_check=True for detailed info. Column Views: argument out of range
Pandahouse:
connection = dict(database='default', host='http://localhost:8123', user='default', schema='default', password='')
ph.to_clickhouse(data, 'rec_eval_data', index=False, chunksize=100000, connection=connection)
Error: It uses http and GET method, which automatically acts in readonly mode, so I could not proceed. Maybe there's a way to change method to POST?
clickhouse_driver insert_dataframe:
client.insert_dataframe('INSERT INTO rec_eval_data VALUES ', data)
Error: TypeError: Unsupported column type: <class 'numpy.ndarray'>. list or tuple is expected.
iteration:
for date, row in data.T.iteritems():
client.execute("INSERT INTO rec_eval_data " "(UserID, "
"ObjectID, "
"ObjectClass, "
"Views, "
"RecDate, "
"Events)"
" VALUES " "({UserID}, " "{ObjectID}, " "{ObjectClass}, " "{Views}, " "{RecDate}, " "{Events}) " .format( UserID=UserID, ObjectID=row['ObjectID'], ObjectClass=row['ObjectClass'], Views=row['UserID'], RecDate=row['RecDate'], Events=row['Events']))
Error: It tries to split UserID into pieces. Can't find how to avoid it: DB::Exception: Missing columns: '6bfdd594e2c2' '496f' 'acd8' '6f6e526c' while processing query: '(((6f6e526c
- acd8) - 496f
) - 88.) - 6bfdd594e2c2
', required columns: '6f6e526c' 'acd8' '496f' '6bfdd594e2c2' '6f6e526c' 'acd8' '496f' '6bfdd594e2c2'.
Please, help, I can't fix it( I'm new both to CH and pandas(
Stack trace: