How to get data from pickle files into a pandas dataframe

Asked 21/10, 2016 at 15:45 Answered 14/5, 2024 at 20:40

I'm working on a social media sentiment analysis for a class. I have gotten all of the tweets about the Kentucky Derby for a 2 month period saved into pkl files.

My question is: how do I get all of these pickle dump files loaded into a dataframe?

Here is my code:

import sklearn as sk
import pandas as pd
import  got3

def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
    yield start_date + timedelta(n)

start_date = date(2016, 3, 31)
end_date = date(2016, 6, 1)

dates = []

for single_date in daterange(start_date, end_date):
    dates.append(single_date.strftime("%Y-%m-%d"))

for i in range(len(dates)-1): 
    this_date = dates[i]
    tomorrow_date = dates[i+1]
    print("Getting tweets for " + tomorrow_date)
    tweetCriteria = got3.manager.TweetCriteria()
    tweetCriteria.setQuerySearch("Kentucky Derby")
    tweetCriteria.setQuerySearch("KYDerby")
    tweetCriteria.setSince(this_date)
    tweetCriteria.setUntil(tomorrow_date)
    Kentucky_Derby_tweets = got3.manager.TweetManager.getTweets(tweetCriteria)
    pkl.dump(Kentucky_Derby_tweets, open(tomorrow_date + ".pkl", "wb"))

Fustanella answered 21/10, 2016 at 15:45 Comment(1)

You can load each into a list, append each list into a master list, then use the list to put it into a DataFrame – Neoarsphenamine 21/10, 2016 at 15:50

You can use

pd.read_pickle(filename)
add it to a list
then pd.concat(thelist)

Paleoecology answered 21/10, 2016 at 19:1 Comment(0)

this site can be help you :

pd.read_pickle('tomorrow_date.pkl')

pd.read_pickle('tomorrow_date.pickle')

you should notice to last line it not be this open(tomorrow_date + ".pkl", "wb") it should be like this open('tomorrow_date' + ".pkl", "wb") hop help you

Provost answered 31/8, 2022 at 11:13 Comment(0)

If all you .pkl files are in one folder, let's say the path folder, you can finish the process in one line as this:

df = pd.concat(map(pd.read_pickle, glob.glob(os.path.join(path, '*.pkl'))))

Vague answered 14/5, 2024 at 20:40 Comment(0)

Recommended topics

Hot tags