Use-case
I am using Apache Parquet files as a fast IO format for large-ish spatial data that I am working on in Python with GeoPandas. I am storing feature geometries as WKB and would like to record the coordinate reference system (CRS) as metadata associated with the WKB data.
Code problem
I am trying to assign arbitrary metadata to a pyarrow.Field
object.
What I've tried
Suppose table
is a pyarrow.Table
instantiated from df
, a pandas.DataFrame
:
df = pd.DataFrame({
'foo' : [1, 3, 2],
'bar' : [6, 4, 5]
})
table = pa.Table.from_pandas(df)
According to the pyarrow
docs, column metadata is contained in a field
which belongs to a schema
(source), and optional metadata may be added to a field
(source).
If I try to assign a value to the metadata
attribute, it raises an error:
>>> table.schema.field_by_name('foo').metadata = {'crs' : '4283'}
AttributeError: attribute 'metadata' of 'pyarrow.lib.Field' objects is not writable
>>> table.column(0).field.metadata = {'crs' : '4283'}
AttributeError: attribute 'metadata' of 'pyarrow.lib.Field' objects is not writable
If I try to assign a field (with metadata associated by way of the add_metadata
method) to a field, it returns an error:
>>> table.schema.field_by_name('foo') = (
table.schema.field_by_name('foo').add_metadata({'crs' : '4283'})
)
SyntaxError: can't assign to function call
>>> table.column(0).field = table.column(0).field.add_metadata({'crs' : '4283'})
AttributeError: attribute 'field' of 'pyarrow.lib.Column' objects is not writable
I have even tried assigning metadata to a pandas.Series
object e.g.
df['foo']._metadata.append({'crs' : '4283'})
but this is not returned in the metadata when calling the pandas_metadata
(docs) method on the schema
attribute of the table
object.
Research
On stackoverflow, this question remains unanswered, and this related question concerns Scala, not Python and pyarrow
. Elsewhere I have seen metadata associated with a pyarrow.Field
object, but only by instantiating pyarrow.Field
and pyarrow.Table
objects from the ground up.
PS
This is my first time posting to stackoverflow so thanks in advance and apologies for any errors.
Table.replace_schema_metadata
method seems to handle the table metadata part. – Monogenic