Using Weka Java Code - How Convert CSV (without header row) to ARFF Format?
Asked Answered
R

3

7

I'm using the Weka Java library to read in a CSV file and convert it to an ARFF file.

The problem is that the CSV file doesn't have a header row, only data. How do I assign attribute names after I bring in the CSV file? (all the columns would be string data types)

Here is the code I have so far:

    CSVLoader loader = new CSVLoader();
    loader.setSource(new File(CSVFilePath));
    Instances data = loader.getDataSet();

    ArffSaver saver = new ArffSaver();
    saver.setInstances(data);
    saver.setFile(new File(outputFilePath));
    saver.writeBatch();

I tried looking through the Weka source code to figure this out but I couldn't make heads or tails of it :-(

Rampant answered 18/8, 2010 at 22:17 Comment(0)
D
7

The short answer is, you can't assign attribute names after you read in the file.

CSVLoader assumes the first line of the CSV is the header. If that's an instance, it will use that instance data as the header row and not as instance data, which is definitely not what you want.

Before the code above, you need to read the file in, write a header row, and save the file again.

See my answer to your question on the weka mailing list.

Decima answered 19/8, 2010 at 4:22 Comment(2)
Thanks. I'll try that. I assumed my question to the mailing list got lost in the shuffle :-(Rampant
No worries, asking through different avenues is a good idea :)Decima
P
4

You can use the option -H if you have no header row present in the data.

CSVLoader loader = new CSVLoader();
loader.setSource(new File(CSVFilePath));

String[] options = new String[1]; 
options[0] = "-H";
loader.setOptions(options);

Instances data = loader.getDataSet();

see: http://weka.sourceforge.net/doc.dev/weka/core/converters/CSVLoader.html

Parliamentary answered 10/4, 2014 at 11:26 Comment(1)
There seems to be a shorthand now, loader.setNoHeaderRowPresent(true). See: weka.sourceforge.net/doc.dev/weka/core/converters/…Mulderig
J
1

My solution:

SELECT 'nameColumn1','nameColumn2'
UNION
SELECT idColumn1,idColumn2
FROM path
 INTO OUTFILE '/tmp/w.csv'
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 LINES TERMINATED BY '\n';

nameColumn1 and nameColumn2 are the column header that will appear as the first line of the csv file.

Jigging answered 20/11, 2013 at 17:20 Comment(1)
This solution seems a little out of place, but ok.Dyke

© 2022 - 2024 — McMap. All rights reserved.