AttributeError: module 'pandas' has no attribute 'to_csv'

C

3

5

I took some rows from csv file like this

pd.DataFrame(CV_data.take(5), columns=CV_data.columns)

and performed some functions on it. now i want to save it in csv again but it is giving error module 'pandas' has no attribute 'to_csv' I am trying to save it like this

pd.to_csv(CV_data, sep='\t', encoding='utf-8')

here is my full code. how can i save my resulting data in csv or excel?

   # Disable warnings, set Matplotlib inline plotting and load Pandas package
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline
import pandas as pd
pd.options.display.mpl_style = 'default' 

CV_data = sqlContext.read.load('Downloads/data/churn-bigml-80.csv', 
                          format='com.databricks.spark.csv', 
                          header='true', 
                          inferSchema='true')

final_test_data = sqlContext.read.load('Downloads/data/churn-bigml-20.csv', 
                          format='com.databricks.spark.csv', 
                          header='true', 
                          inferSchema='true')
CV_data.cache()
CV_data.printSchema() 

pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 

from pyspark.sql.types import DoubleType
from pyspark.sql.functions import UserDefinedFunction

binary_map = {'Yes':1.0, 'No':0.0, True:1.0, False:0.0} 
toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())

CV_data = CV_data.drop('State').drop('Area code') \
    .drop('Total day charge').drop('Total eve charge') \
    .drop('Total night charge').drop('Total intl charge') \
    .withColumn('Churn', toNum(CV_data['Churn'])) \
    .withColumn('International plan', toNum(CV_data['International plan'])) \
    .withColumn('Voice mail plan', toNum(CV_data['Voice mail plan'])).cache()

final_test_data = final_test_data.drop('State').drop('Area code') \
    .drop('Total day charge').drop('Total eve charge') \
    .drop('Total night charge').drop('Total intl charge') \
    .withColumn('Churn', toNum(final_test_data['Churn'])) \
    .withColumn('International plan', toNum(final_test_data['International plan'])) \
    .withColumn('Voice mail plan', toNum(final_test_data['Voice mail plan'])).cache()

pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 

from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree

def labelData(data):
    # label: row[end], features: row[0:end-1]
    return data.map(lambda row: LabeledPoint(row[-1], row[:-1]))

training_data, testing_data = labelData(CV_data).randomSplit([0.8, 0.2])

model = DecisionTree.trainClassifier(training_data, numClasses=2, maxDepth=2,
                                     categoricalFeaturesInfo={1:2, 2:2},
                                     impurity='gini', maxBins=32)

print (model.toDebugString())  
print ('Feature 12:', CV_data.columns[12])
print ('Feature 4: ', CV_data.columns[4] ) 

from pyspark.mllib.evaluation import MulticlassMetrics

def getPredictionsLabels(model, test_data):
    predictions = model.predict(test_data.map(lambda r: r.features))
    return predictions.zip(test_data.map(lambda r: r.label))

def printMetrics(predictions_and_labels):
    metrics = MulticlassMetrics(predictions_and_labels)
    print ('Precision of True ', metrics.precision(1))
    print ('Precision of False', metrics.precision(0))
    print ('Recall of True    ', metrics.recall(1))
    print ('Recall of False   ', metrics.recall(0))
    print ('F-1 Score         ', metrics.fMeasure())
    print ('Confusion Matrix\n', metrics.confusionMatrix().toArray()) 

predictions_and_labels = getPredictionsLabels(model, testing_data)

printMetrics(predictions_and_labels)  

CV_data.groupby('Churn').count().toPandas() 

stratified_CV_data = CV_data.sampleBy('Churn', fractions={0: 388./2278, 1: 1.0}).cache()

stratified_CV_data.groupby('Churn').count().toPandas() 

pd.to_csv(CV_data, sep='\t', encoding='utf-8')

Christen answered 25/7, 2016 at 11:22 Comment(0)

B

23

to_csv is a method of a DataFrame object, not of the pandas module.

df = pd.DataFrame(CV_data.take(5), columns=CV_data.columns)

# whatever manipulations on df

df.to_csv(...)

You also have a line pd.DataFrame(CV_data.take(5), columns=CV_data.columns) in your code.

This line creates a dataframe and then discards it. Even if you were successfully calling to_csv, none of your changes to CV_data would have been reflected in that dataframe (and therefore in the outputed csv file).

Beabeach answered 25/7, 2016 at 11:23 Comment(0)

I

1

Solution- You should write df.to_csv instead of pd.to_csv

Justification- to_csv is a method to an object which is a df (DataFrame); while pd is Panda module.

Hence, your code was not working and throwing this Error " AttributeError: module 'pandas' has no attribute 'to_csv'"

Industrialism answered 25/7, 2020 at 12:37 Comment(1)

Thanks for contributing to Stack Overflow. However, this answer does not seem to add anything new. Everything you mentioned is already explained in the accepted answer that was posted more than 4 years ago and has 11 upvotes at this time. When answering old questions, make sure to add something new. Also, you might want to improve the formatting of your posts. See the Markdown Editing Help if you don't know how to do that. – Uncommunicative 25/7, 2020 at 13:44

P

0

This will do the job!

#Create a DataFrame:    
new_df = pd.DataFrame({'id': [1,2,3,4,5], 'LETTERS': ['A','B','C','D','E'], 'letters': ['a','b','c','d','e']})

#Save it as csv in your folder:    
new_df.to_csv('C:\\Users\\You\\Desktop\\new_df.csv')

Pedestrianism answered 22/9, 2019 at 11:13 Comment(0)

Recommended topics

Hot tags