Google BigQuery Schema conflict (pyarrow error) with Numeric data type using load_table_from_dataframe
Asked Answered
G

1

6

I got the following error when I upload numeric data (int64 or float64) from a Pandas dataframe to a "Numeric" Google BigQuery Data Type:

pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)

I tried to change the datatype of 'tt' field from Pandas dataframe without results:

df_data_f['tt'] = df_data_f['tt'].astype('float64')

and

df_data_f['tt'] = df_data_f['tt'].astype('int64')

Using the schema:

 job_config.schema = [
                    ...             
                    bigquery.SchemaField('tt', 'NUMERIC')
                    ...]

Reading this google-cloud-python issues report I got:

NUMERIC = pyarrow.decimal128(38, 9)

Therefore the "Numeric" Google BigQuery Data Type uses more bytes than "float64" or "int64", and that is why pyarrow can't match the datatypes.


I have:

Python 3.6.4

pandas 1.0.3

pyarrow 0.17.0

google-cloud-bigquery 1.24.0

Gujarati answered 25/4, 2020 at 6:13 Comment(0)
G
12

I'm not sure If this is the best solution, but I solved this issue changing the datatype:

import decimal
...
df_data_f['tt'] = df_data_f['tt'].astype(str).map(decimal.Decimal)
Gujarati answered 25/4, 2020 at 6:13 Comment(5)
I guess this depends on your use-case. Did you have a specific reason to use NUMERIC type? If not, and your data was already 8 bytes, a more straightforward solution would have been directly setting the data type in BigQuery to be FLOAT64 (or INT64 if it was integers).Kinross
Hi Albert Albesa! You are right, that would be easier, but the table schema was defined months ago, and currently have a lot of data. So, the thing is append new data without change the destination table schema.Gujarati
I tried using this fix but then got Got bytestring of length 4 (expected 16), so this actually reduced the bytestring lengthFoliar
Never mind... I tried it again and it worked this time...Foliar
This answer helped me a lot. Just to add one thing, if your data column has any null/None value, converting to str and then Decimal will cause another error. First convert it to float, then str, and then map to Decimal.Elamitic

© 2022 - 2024 — McMap. All rights reserved.