I am using Parquet.Net to read parquet files, but the only option to read from the parquet file is.
//get the first group
Parquet.ParquetRowGroupReader rowGroup = myParquet.OpenRowGroupReader(0);
//gets the first column
Parquet.Data.DataColumn col1 = rowGroup.ReadColumn(myParquet.Schema.GetDataFields()[0]);
This allows me to get the the first column from the first rowGroup, but the problem is, the first rowGroup can be something like 4million rows and readColumn will read all 4million values.
How do I tell readColumn that I only want it to read, say the first 100 rows. Reading all 4million rows wastes memory and file read time.
I actually got a memory error, until I changed my code to resize that 4million value array down to my 100. After calling each column.
I don't necessarily need row based access, I can work with columns, I just don't need a whole rowGroup worth of values in each column. Is this possible? If row based access is better, how does one use it? The Parquet.Net project site doesn't give any examples, and just talks about tables.