I have a .csv file with around 300,000 rows. I have set it to group by a particular column, with each group having around 140 members (2138 total groups).
I am trying to generate a numpy array of the group names. I have used a for loop to generate the names as of now but it takes a while for everything to process.
import numpy as np
import pandas as pd
df = pd.read_csv('file.csv')
grouped = df.groupby('col1')
group_names = []
for name,group in grouped: group_names.append(name)
group_names = np.array(group_names, dtype=object)
I am wondering if there is a more efficient way to do this, whether by using a pandas module or directly converting the names into a numpy array.