scikit learn output metrics.classification_report into CSV/tab-delimited format

Asked 23/9, 2016 at 13:45 Answered 3/6, 2024 at 6:50

python csv text scikit-learn classification

I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model

from __future__ import print_function

# Read **`file.csv`** into a pandas DataFrame

import pandas as pd
path = 'data/file.csv'
merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)

# define X and y using the original DataFrame
X = merged.text
y = merged.grid

# split X and y into training and testing sets;
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# import and instantiate CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()

# create document-term matrices using CountVectorizer
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)

# import and instantiate MultinomialNB
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

# fit a Multinomial Naive Bayes model
nb.fit(X_train_dtm, y_train)

# make class predictions
y_pred_class = nb.predict(X_test_dtm)

# generate classification report
from sklearn import metrics
print(metrics.classification_report(y_test, y_pred_class))

And a simplified output of the metrics.classification_report on command line screen looks like this:

             precision  recall   f1-score   support
     12       0.84      0.48      0.61      2843
     13       0.00      0.00      0.00        69
     15       1.00      0.19      0.32       232
     16       0.75      0.02      0.05       965
     33       1.00      0.04      0.07       155
      4       0.59      0.34      0.43      5600
     41       0.63      0.49      0.55      6218
     42       0.00      0.00      0.00       102
     49       0.00      0.00      0.00        11
      5       0.90      0.06      0.12      2010
     50       0.00      0.00      0.00         5
     51       0.96      0.07      0.13      1267
     58       1.00      0.01      0.02       180
     59       0.37      0.80      0.51      8127
      7       0.91      0.05      0.10       579
      8       0.50      0.56      0.53      7555      
    avg/total 0.59      0.48      0.45     35919

I was wondering if there was any way to get the report output into a standard csv file with regular column headers

When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:

Osteomalacia answered 23/9, 2016 at 13:45 Comment(2)

I'll be trying to recreate the results as I type this, But have u tried turning the table into a DataFrame using Pandas and then sending the dataframe to csv using dataframe_name_here.to_csv() ? Could you also show the code in which you write the results to the csv? – Disendow 23/9, 2016 at 13:58

@Disendow I have edited the question and provided the full python code...I was passing the output of the script to a CSV file from Linux command line thus: $ python3 script.py > result.csv – Osteomalacia 23/9, 2016 at 15:14

129

As of scikit-learn v0.20, the easiest way to convert a classification report to a pandas Dataframe is by simply having the report returned as a dict:

report = classification_report(y_test, y_pred, output_dict=True)

and then construct a Dataframe and transpose it:

df = pandas.DataFrame(report).transpose()

From here on, you are free to use the standard pandas methods to generate your desired output formats (CSV, HTML, LaTeX, ...).

See the documentation.

Flavio answered 14/12, 2018 at 13:20 Comment(2)

df.to_csv('file_name.csv') for the lazy :) – Guggenheim 11/11, 2020 at 16:26

Perfect answer. Minor note: since the output dict accuracy has only one value, it will be repeated in the accuracy row of your dataframe. If you want your export to mirror the sklearn output exactly, you can use the snippet below.

report.update({"accuracy": {"precision": None, "recall": None, "f1-score": report["accuracy"], "support": report['macro avg']['support']}})

– Barbabra 25/2, 2022 at 23:52

If you want the individual scores this should do the job just fine.

import pandas as pd

def classification_report_csv(report):
    report_data = []
    lines = report.split('\n')
    for line in lines[2:-3]:
        row = {}
        row_data = line.split('      ')
        row['class'] = row_data[0]
        row['precision'] = float(row_data[1])
        row['recall'] = float(row_data[2])
        row['f1_score'] = float(row_data[3])
        row['support'] = float(row_data[4])
        report_data.append(row)
    dataframe = pd.DataFrame.from_dict(report_data)
    dataframe.to_csv('classification_report.csv', index = False)

report = classification_report(y_true, y_pred)
classification_report_csv(report)

Pigment answered 8/12, 2016 at 16:33 Comment(4)

row['precision'] = float(row_data[1]) ValueError: could not convert string to float: – Hewes 29/12, 2017 at 22:17

change line row_data = line.split(' ') by row_data = line.split(' ') row_data = list(filter(None, row_data)) – Sorites 20/7, 2018 at 15:27

Really cool ,and thanks~ And I make a comment for the split statement: row_data = line.split(' ') , this one should be better like this : row_data = line.split(), because some time the space number in the report string is not equal – Aha 15/3, 2019 at 8:13

Better to replace row_data = line.split(' ') with row_data = ' '.join(line.split()) row_data = row_data.split(' ') to account for irregular spaces. – Nostology 18/12, 2019 at 13:41

Just import pandas as pd and make sure that you set the output_dict parameter which by default is False to True when computing the classification_report. This will result in an classification_report dictionary which you can then pass to a pandas DataFrame method. You may want to transpose the resulting DataFrame to fit the fit the output format that you want. The resulting DataFrame may then be written to a csv file as you wish.

clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
clsf_report.to_csv('Your Classification Report Name.csv', index= True)

Histidine answered 25/3, 2019 at 22:39 Comment(0)

We can get the actual values from the precision_recall_fscore_support function and then put them into data frames. the below code will give the same result, but now in a pandas dataframe:

clf_rep = metrics.precision_recall_fscore_support(true, pred)
out_dict = {
             "precision" :clf_rep[0].round(2)
            ,"recall" : clf_rep[1].round(2)
            ,"f1-score" : clf_rep[2].round(2)
            ,"support" : clf_rep[3]
            }
out_df = pd.DataFrame(out_dict, index = nb.classes_)
avg_tot = (out_df.apply(lambda x: round(x.mean(), 2) if x.name!="support" else  round(x.sum(), 2)).to_frame().T)
avg_tot.index = ["avg/total"]
out_df = out_df.append(avg_tot)
print out_df

Lemus answered 26/2, 2017 at 10:2 Comment(0)

While the previous answers are probably all working I found them a bit verbose. The following stores the individual class results as well as the summary line in a single dataframe. Not very sensitive to changes in the report but did the trick for me.

#init snippet and fake data
from io import StringIO
import re
import pandas as pd
from sklearn import metrics
true_label = [1,1,2,2,3,3]
pred_label = [1,2,2,3,3,1]

def report_to_df(report):
    report = re.sub(r" +", " ", report).replace("avg / total", "avg/total").replace("\n ", "\n")
    report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0)        
    return(report_df)

#txt report to df
report = metrics.classification_report(true_label, pred_label)
report_df = report_to_df(report)

#store, print, copy...
print (report_df)

Which gives the desired output:

Classes precision   recall  f1-score    support
1   0.5 0.5 0.5 2
2   0.5 0.5 0.5 2
3   0.5 0.5 0.5 2
avg/total   0.5 0.5 0.5 6

Chidester answered 27/9, 2017 at 12:27 Comment(0)

It's obviously a better idea to just output the classification report as dict:

sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)

But here's a function I made to convert all classes (only classes) results to a pandas dataframe.

def report_to_df(report):
    report = [x.split(' ') for x in report.split('\n')]
    header = ['Class Name']+[x for x in report[0] if x!='']
    values = []
    for row in report[1:-5]:
        row = [value for value in row if value!='']
        if row!=[]:
            values.append(row)
    df = pd.DataFrame(data = values, columns = header)
    return df

Heraclea answered 29/3, 2019 at 5:2 Comment(0)

As mentioned in one of the posts in here, precision_recall_fscore_support is analogous to classification_report.

Then it suffices to use pandas to easily format the data in a columnar format, similar to what classification_report does. Here is an example:

import numpy as np
import pandas as pd

from sklearn.metrics import classification_report
from  sklearn.metrics import precision_recall_fscore_support

np.random.seed(0)

y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)

def pandas_classification_report(y_true, y_pred):
    metrics_summary = precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred)
    
    avg = list(precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred,
            average='weighted'))

    metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
    class_report_df = pd.DataFrame(
        list(metrics_summary),
        index=metrics_sum_index)
    
    support = class_report_df.loc['support']
    total = support.sum() 
    avg[-1] = total
    
    class_report_df['avg / total'] = avg

    return class_report_df.T

With classification_report You'll get something like:

print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))

Output:

             precision    recall  f1-score   support

          0   0.379032  0.470000  0.419643       400
          1   0.579365  0.486667  0.528986       600

avg / total   0.499232  0.480000  0.485248      1000

Then with our custom funtion pandas_classification_report:

df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)

Output:

             precision    recall  f1-score  support
0             0.379032  0.470000  0.419643    400.0
1             0.579365  0.486667  0.528986    600.0
avg / total   0.499232  0.480000  0.485248   1000.0

Then just save it to csv format (refer to here for other separator formating like sep=';'):

df_class_report.to_csv('my_csv_file.csv',  sep=',')

I open my_csv_file.csv with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):

Cogen answered 29/4, 2018 at 21:20 Comment(3)

The averages calculated by classification_report are weighted with the support values. – Charlean 10/5, 2018 at 20:32

So it should be avg = (class_report_df.loc[metrics_sum_index[:-1]] * class_report_df.loc[metrics_sum_index[-1]]).sum(axis=1) / total – Charlean 10/5, 2018 at 21:19

Nice catch @Flynamic! I figured it out that precision_recall_fscore_support has an average param. which does just what you suggest! – Cogen 11/5, 2018 at 19:9

I also found some of the answers a bit verbose. Here is my three line solution, using precision_recall_fscore_support as others have suggested.

import pandas as pd
from sklearn.metrics import precision_recall_fscore_support

report = pd.DataFrame(list(precision_recall_fscore_support(y_true, y_pred)),
            index=['Precision', 'Recall', 'F1-score', 'Support']).T

# Now add the 'Avg/Total' row
report.loc['Avg/Total', :] = precision_recall_fscore_support(y_true, y_test,
    average='weighted')
report.loc['Avg/Total', 'Support'] = report['Support'].sum()

Mimosaceous answered 2/5, 2018 at 20:22 Comment(1)

This works, but trying to use the labels parameter of precision_recall_fscore_support raises, for some reason, ValueError: y contains previously unseen labels – Barbette 18/2, 2019 at 3:18

The simplest and best way I found is:

classes = ['class 1','class 2','class 3']

report = classification_report(Y[test], Y_pred, target_names=classes)

report_path = "report.txt"

text_file = open(report_path, "w")
n = text_file.write(report)
text_file.close()

Ker answered 4/2, 2020 at 16:19 Comment(0)

Another option is to calculate the underlying data and compose the report on your own. All the statistics you will get by

precision_recall_fscore_support

Fulgurous answered 17/3, 2018 at 9:32 Comment(0)

Along with example input-output, here's the other function metrics_report_to_df(). Implementing precision_recall_fscore_support from Sklearn metrics should do:

# Generates classification metrics using precision_recall_fscore_support:
from sklearn import metrics
import pandas as pd
import numpy as np; from numpy import random

# Simulating true and predicted labels as test dataset: 
np.random.seed(10)
y_true = np.array([0]*300 + [1]*700)
y_pred = np.random.randint(2, size=1000)

# Here's the custom function returning classification report dataframe:
def metrics_report_to_df(ytrue, ypred):
    precision, recall, fscore, support = metrics.precision_recall_fscore_support(ytrue, ypred)
    classification_report = pd.concat(map(pd.DataFrame, [precision, recall, fscore, support]), axis=1)
    classification_report.columns = ["precision", "recall", "f1-score", "support"] # Add row w "avg/total"
    classification_report.loc['avg/Total', :] = metrics.precision_recall_fscore_support(ytrue, ypred, average='weighted')
    classification_report.loc['avg/Total', 'support'] = classification_report['support'].sum() 
    return(classification_report)

# Provide input as true_label and predicted label (from classifier)
classification_report = metrics_report_to_df(y_true, y_pred)

# Here's the output (metrics report transformed to dataframe )
In [1047]: classification_report
Out[1047]: 
           precision    recall  f1-score  support
0           0.300578  0.520000  0.380952    300.0
1           0.700624  0.481429  0.570703    700.0
avg/Total   0.580610  0.493000  0.513778   1000.0

Journalistic answered 29/5, 2018 at 22:47 Comment(0)

I have modified @kindjacket's answer. Try this:

import collections
def classification_report_df(report):
    report_data = []
    lines = report.split('\n')
    del lines[-5]
    del lines[-1]
    del lines[1]
    for line in lines[1:]:
        row = collections.OrderedDict()
        row_data = line.split()
        row_data = list(filter(None, row_data))
        row['class'] = row_data[0] + " " + row_data[1]
        row['precision'] = float(row_data[2])
        row['recall'] = float(row_data[3])
        row['f1_score'] = float(row_data[4])
        row['support'] = int(row_data[5])
        report_data.append(row)
    df = pd.DataFrame.from_dict(report_data)
    df.set_index('class', inplace=True)
    return df

You can just export that df to csv using pandas

Moxie answered 22/1, 2019 at 5:10 Comment(1)

The line row['support'] = int(row_data[5]) raises IndexError: list index out of range – Barbette 18/2, 2019 at 3:0

Below function can be used to get the classification report as a pandas dataframe which then can be dumped as a csv file. The resulting csv file will look exactly like when we print the classification report.

import pandas as pd
from sklearn import metrics


def classification_report_df(y_true, y_pred):
    report = metrics.classification_report(y_true, y_pred, output_dict=True)
    df_report = pd.DataFrame(report).transpose()
    df_report.round(3)        
    df_report = df_report.astype({'support': int})    
    df_report.loc['accuracy',['precision','recall','support']] = [None,None,df_report.loc['macro avg']['support']]
    return df_report


report = classification_report_df(y_true, y_pred)
report.to_csv("<Full Path to Save CSV>")

Devalue answered 30/11, 2022 at 9:59 Comment(0)

def to_table(report):
    report = report.splitlines()
    res = []
    res.append(['']+report[0].split())
    for row in report[2:-2]:
       res.append(row.split())
    lr = report[-1].split()
    res.append([' '.join(lr[:3])]+lr[3:])
    return np.array(res)

returns a numpy array which can be turned to pandas dataframe or just be saved as csv file.

Fahrenheit answered 24/7, 2017 at 14:51 Comment(0)

This is my code for 2 classes(pos,neg) classification

report = metrics.precision_recall_fscore_support(true_labels,predicted_labels,labels=classes)
        rowDicionary["precision_pos"] = report[0][0]
        rowDicionary["recall_pos"] = report[1][0]
        rowDicionary["f1-score_pos"] = report[2][0]
        rowDicionary["support_pos"] = report[3][0]
        rowDicionary["precision_neg"] = report[0][1]
        rowDicionary["recall_neg"] = report[1][1]
        rowDicionary["f1-score_neg"] = report[2][1]
        rowDicionary["support_neg"] = report[3][1]
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writerow(rowDicionary)

Sucy answered 2/4, 2018 at 8:35 Comment(0)

I have written below code to extract the classification report and save it to an excel file:

def classifcation_report_processing(model_to_report):
    tmp = list()
    for row in model_to_report.split("\n"):
        parsed_row = [x for x in row.split("  ") if len(x) > 0]
        if len(parsed_row) > 0:
            tmp.append(parsed_row)

    # Store in dictionary
    measures = tmp[0]

    D_class_data = defaultdict(dict)
    for row in tmp[1:]:
        class_label = row[0]
        for j, m in enumerate(measures):
            D_class_data[class_label][m.strip()] = float(row[j + 1].strip())
    save_report = pd.DataFrame.from_dict(D_class_data).T
    path_to_save = os.getcwd() +'/Classification_report.xlsx'
    save_report.to_excel(path_to_save, index=True)
    return save_report.head(5)

To call the function below line can be used anywhere in the program:

saving_CL_report_naive_bayes = classifcation_report_processing(classification_report(y_val, prediction))

The output looks like below:

Chant answered 4/2, 2019 at 10:23 Comment(0)

I had the same problem what i did was, paste the string output of metrics.classification_report into google sheets or excel and split the text into columns by custom 5 whitespaces.

Defecate answered 15/3, 2019 at 4:8 Comment(0)

Definitely worth using:

sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)

But a slightly revised version of the function by Yash Nag is as follows. The function includes the accuracy, macro accuracy and weighted accuracy rows along with the classes:

def classification_report_to_dataframe(str_representation_of_report):
    split_string = [x.split(' ') for x in str_representation_of_report.split('\n')]
    column_names = ['']+[x for x in split_string[0] if x!='']
    values = []
    for table_row in split_string[1:-1]:
        table_row = [value for value in table_row if value!='']
        if table_row!=[]:
            values.append(table_row)
    for i in values:
        for j in range(len(i)):
            if i[1] == 'avg':
                i[0:2] = [' '.join(i[0:2])]
            if len(i) == 3:
                i.insert(1,np.nan)
                i.insert(2, np.nan)
            else:
                pass
    report_to_df = pd.DataFrame(data=values, columns=column_names)
    return report_to_df

The output for a test classification report may be found here

Godwit answered 17/5, 2021 at 11:5 Comment(0)

2024 solution

This works perfectly in various cases:

import re
import pandas as pd


def classification_report_to_df(report):
    """
    Convert a text-based classification report to a pandas DataFrame.

    Parameters:
    report (str): The classification report as a string.

    Returns:
    pd.DataFrame: The classification report as a pandas DataFrame.
    """
    report_data = []
    lines = report.split("\n")

    for line in lines:
        if line.strip() == "":
            continue

        row_data = re.split(r"\s{2,}", line.strip())

        # Handle the accuracy line separately
        if "accuracy" in line:
            row_data = ["accuracy", "", "", *row_data[-2:]]

        report_data.append(row_data)

    columns = ["class", *report_data[0]]
    dataframe = pd.DataFrame(report_data[1:], columns=columns)

    return dataframe

Example Usage:

from sklearn.metrics import classification_report

# Sample data

actual_labels = [0, 2, 2, 0, 2, 0, 1, 1, 0]
predicted_labels = [1, 2, 0, 2, 1, 0, 1, 1, 2]
class_names = ["Setosa", "Versicolor", "Virginica"]

# Generate the classification report
report = classification_report(
    actual_labels, predicted_labels, target_names=class_names
)

# Convert the classification report to a pandas DataFrame
df = classification_report_to_df(report)
print(df)

Friederike answered 3/6, 2024 at 6:50 Comment(0)

-2

The way I have always solved output problems is like what I've mentioned in my previous comment, I've converted my output to a DataFrame. Not only is it incredibly easy to send to files (see here), but Pandas is really easy to manipulate the data structure. The other way I have solved this is writing the output line-by-line using CSV and specifically using writerow.

If you manage to get the output into a dataframe it would be

dataframe_name_here.to_csv()

or if using CSV it would be something like the example they provide in the CSV link.

Disendow answered 23/9, 2016 at 15:53 Comment(1)

thanks I have tried to use a data frame;

Result = metrics.classification_report(y_test, y_pred_class); df = pd.DataFrame(Result); df.to_csv(results.csv, sep='\t')

but got an error pandas.core.common.PandasError: DataFrame constructor not properly called! – Osteomalacia 23/9, 2016 at 18:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags