how to Load CSV Data in scikit and using it for Naive Bayes Classification
Asked Answered
S

1

6

Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for target.

Use the same data for Train and Test or use a complete set just for test.

Sl No,Member ID,Member Name,Location,DOB,Gender,Marital Status,Children,Ethnicity,Insurance Plan ID,Annual Income ($),Twitter User ID
1,70000001,Fly Dorami,New York,39786,M,Single,,Asian,2002,0,548900028
2,70000002,Bennie Ariana,Pennsylvania,6/24/1940,F,Single,,Caucasian,2002,66313,
3,70000003,Brad Farley,Pennsylvania,12001,F,Married,4,African American,2002,98444,
4,70000004,Daggoo Cece,Indiana,14032,F,Married,2,Hispanic,2001,41896,113481472.
Schrecklichkeit answered 23/8, 2013 at 6:12 Comment(0)
D
13

The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np

# create data frame containing your data, each column can be accessed # by df['column   name']
df = pd.read_csv('/your/path/yourFile.csv')

target_names = np.array(['Positives','Negatives'])

# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets

# define training and test sets
train = df[df['is_train']==True]
test = df[df['is_train']==False]

trainTargets = np.array(train['Targets']).astype(int)
testTargets = np.array(test['Targets']).astype(int)

# columns you want to model
features = df.columns[0:7]

# call Gaussian Naive Bayesian class with default parameters
gnb = GaussianNB()

# train model
y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])
Dareen answered 23/8, 2013 at 17:50 Comment(3)
Thanks for the solution, how to feed the target example "Marital Status". Since when I run the program i get error targets undefined df['Type'] = pd.Factor(targets, target_names) line ..Schrecklichkeit
You have to define the array, targets, it should be a single colunm containing 0's and 1's if your doing binary classification before you call df['Type'] = pd.Factor(targets, target_names). Can you give a little more information on your classification problem.Dareen
Above code is explanatory but missing the variable "targets". Could you add ?Halves

© 2022 - 2024 — McMap. All rights reserved.