I have a parquet file with 10 row groups:
In [30]: print(pyarrow.parquet.ParquetFile("/tmp/test2.parquet").num_row_groups)
10
But when I load it using Dask Dataframe, it is read into a single partition:
In [31]: print(dask.dataframe.read_parquet("/tmp/test2.parquet").npartitions)
1
This appears to contradict this answer, which states that Dask Dataframe reads each Parquet row group into a separate partition.
How can I read each Parquet row group into a separate partition with Dask Dataframe? Or must the data be distributed over different files for this to work?