exporting dataframe to arff file python
Asked Answered
J

3

5

I am trying to export a pandas dataframe to .arff file to use it in Weka. I have seen that the module liac-arff can be used for that purpose. Going on the documentation here it seems I have to use arff.dump(obj,fp) Though, I am struggling with obj ( a dictionary) I'm guessing I have to create this by myself. How do you suggest me to do that properly? in a big dataset (3 000 000 lines and 95 columns) is there any example you can provide me to export from pandas dataframe to .arff file using python (v 2.7)?

Jehol answered 26/2, 2018 at 17:27 Comment(0)
Z
8

First install the package: $ pip install arff

Then use in Python:

import arff
arff.dump('filename.arff'
      , df.values
      , relation='relation name'
      , names=df.columns)

Where df is of type pandas.DataFrame. Voila.

Zeist answered 9/5, 2018 at 11:35 Comment(1)
Deprecated in 2023Allaallah
A
3

This is how I did it recently using the package liac-arff. Event if the arff package is more easy to use, it doesn't allow the definition of column types and values of categorical attributes.

df = pd.DataFrame(...)
attributes = [(c, 'NUMERIC') for c in df.columns.values[:-1]]
attributes += [('target', df[t].unique().astype(str).tolist())]
t = df.columns[-1]
data = [df.loc[i].values[:-1].tolist() + [df[t].loc[i]] for i in range(df.shape[0])]

arff_dic = {
    'attributes': attributes,
    'data': data,
    'relation': 'myRel',
    'description': ''
}

with open("myfile.arff", "w", encoding="utf8") as f:
     arff.dump(arff_dic, f)

Values of categorical attributes such as target must be of type str, event if they are numbers.

Angelita answered 19/7, 2019 at 9:2 Comment(0)
V
1

Inspired by the answer of @M. Franklin which was not working very well but the idea was there.

import arff

input // your DataFrame.
attributes = [(j, 'NUMERIC') if input[j].dtypes in ['int64', 'float64'] else (j, input[j].unique().astype(str).tolist()) for j in input]


arff_dic = {
  'attributes': attributes,
  'data': input.values,
  'relation': 'myRel',
  'description': ''
}


with open("myfile.arff", "w", encoding="utf8") as f:
  arff.dump(arff_dic, f)

Following this snippet above, it outputs an arff file with the correct format wished. Good luck guys out there!

Vaporetto answered 17/8, 2021 at 12:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.