I am trying to perform sentiment analysis on the large set of data from social network. The part of the code works great with small size of data.
The input size less than 20mb has no problem computing. But if the size is more than 20mb I am getting memory error.
Environment: Windows 10, anaconda 3.x with updated version packages.
Code:
def captionsenti(F_name):
print ("reading from csv file")
F1_name="caption_senti.csv"
df=pd.read_csv(path+F_name+".csv")
filename=path+F_name+"_"+F1_name
df1=df['tweetText'] # reading caption from data5 file
df1=df1.fillna("h") # filling NaN values
df2=pd.DataFrame()
sid = SentimentIntensityAnalyzer()
print ("calculating sentiment")
for sentence in df1:
#print(sentence)
ss = sid.polarity_scores(sentence) # calculating sentiments
#print ss
df2=df2.append(pd.DataFrame({'tweetText':sentence ,'positive':ss['pos'],'negative':ss['neg'],'neutral':ss['neu'],
'compound':ss['compound']},index=[0]))
df2=df2.join(df.set_index('tweetText'), on='tweetText') # joining two data frames
df2=df2.drop_duplicates(subset=None, keep='first', inplace=False)
df2=df2.dropna(how='any')
df2=df2[['userID','tweetSource','tweetText','positive','neutral','negative','compound','latitude','longitude']]
#print df2
print ("Storing in csv file")
df2.to_csv(filename,encoding='utf-8',header=True,index=True,chunksize=100)
What extra do I need to include to avoid the memory error Thanks for the help in advance.