Fasttext how to load a .csv column into model.predict
Asked Answered
C

1

2

I am new to python and NLP.

I have followed this tutorial (https://fasttext.cc/docs/en/supervised-tutorial.html) to train my fasttxt supervised model in Python.

I have a csv with Text column and I would like to predict labels to ever row from the file. My question is how can I load (transform) the csv column in the predict input and save the label.

model.predict("Which baking dish is best to bake a banana bread ?", k=-1, threshold=0.5)

instead of this ( the text in ""Which baking....") I would like to load row by row and save the label preferably in the new column in same csv.

I would appropriate any help or maybe a tutorial I could follow.

So far I have tried to convert the column in to to list with pandas and numpy array but both came back with "AttributeError: 'function' object has no attribute 'find'"

Carolinian answered 24/9, 2019 at 13:8 Comment(1)
Please post some example rows of your CSV.Keeter
K
2

Take this CSV as example:

index;id;text;author 0;id26305;This process, however, afforded me no means of...;EAP 1;id17569;It never once occurred to me that the fumbling...;HPL 2;id11008;In his left hand was a gold snuff box, from wh...;EAP 3;id27763;How lovely is spring As we looked from Windsor...;MWS 4;id12958;Finding nothing else, not even gold, the Super...;HPL 5;id22965;A youth passed in solitude, my best years spen...;MWS 6;id09674;The astronomer, perhaps, at this point, took r...;EAP 7;id13515;The surcingle hung in ribands from my body. ;EAP 8;id19322;I knew that you could not say to yourself 'ste...;EAP 9;id00912;I confess that neither the structure of langua...;MWS

You can use the following code:

import pandas as pd
import fastText as ft

# here you load the csv into pandas dataframe
df=pd.read_csv('csv_file.csv',sep=';')

# here you load your fasttext module
model=ft.load_model(MODELPATH)

# line by line, you make the predictions and store them in a list
predictions=[]
for line in df['text']:
    pred_label=model.predict(line, k=-1, threshold=0.5)[0][0]
    predictions.append(pred_label)

# you add the list to the dataframe, then save the datframe to new csv
df['prediction']=predictions
df.to_csv('csv_file_w_pred.csv',sep=';',index=False)
Keeter answered 25/9, 2019 at 7:16 Comment(3)
You sir are a genius! Thank you so much !Carolinian
Hi Anakin , just on more question if you don't mind. How would you change this loop to to save more than one label per row ? Lest say right it now it saves a first label from the model and if the output would be like this (('__label__1', '__label__2', '__label__3', '__label__4', '__label__5'), array([0.2950488 , 0.25392196, 0.10041688, 0.05670581, 0.05045504])) , how would you save this ?Carolinian
You only have to use: pred_label=model.predict(line, k=-1, threshold=0.5) without indexesKeeter

© 2022 - 2024 — McMap. All rights reserved.