I am looking for a way to store embedding generated by language model like (T5), in BigQuery of Google.
The embedding are in the form of Numpy array or tensor.
I found 3 approaches:
- TFRecord, write it to a TFRecord file and store to cloud storage
- convert numpy array to string and store as a String column in a table
- store to a column with mode as REPEAT. (Not sure in this way if the order of the embedding vector entries can be preserved)
Hope anybody can give some suggestions or other approaches.
Many thanks