Write each row of pandas dataframe into a new text file - pythonic way
Asked Answered
M

4

5

I was trying to google up if there's a way to parse a pandas dataframe row wise and write the contents of each row into a new text file. My dataframe consists of a single column called Reviews.

Review classification

I'm looking to do some sentiment analysis on movie reviews and that I need each review to be in a separate text file. Can somebody help me here.

Mariann answered 9/11, 2015 at 23:15 Comment(5)
That's going to be very inefficient, what's the purpose for that?Garnishment
Just to perform classification. My requirement is in that wayMariann
make a file name variable that changes every time you write a new line then open that filename with the w parameterDorcy
Can you please suggest the format to write data from dataframe to text file ? @RNar I've been wondering on that for quite a while. Does to_csv work for this ?Mariann
i wouldnt suggest it, no. because you want to write a new file for each row, iterate through the rows then just have something like f = open(filename, 'w') then f.write(row) kind of thing. just make sure to change filename each time.Dorcy
M
9

I've written something like this and it works. anyways thanks for your inputs guys

for index, row in p.iterrows():
    if i > len(p):
       break
    else:
       f = open(str(i)+'.txt', 'w')
       f.write(row[0])
       f.close()
       i+=1

where p is a dataframe.

Mariann answered 10/11, 2015 at 0:23 Comment(1)
For anyone else receiving the Unicode Error: change f = open(str(i)+'.txt', 'w'), to f = open(str(i)+'.txt', 'w', encoding='utf-8')Bronchiole
G
2

It's still inefficient, but since it's required here's one possible solution.

import pandas as pd
from io import StringIO

data="""
column1 column2
c1 c2
c3 c4
c5 c6
"""

df = pd.read_csv(StringIO(data), delimiter='\s+')

i=0
for row in df.values:
    filename = 'testdir/review{}.csv'.format(i)
    row.tofile(filename, sep=",", format="%s")
    i+=1

This will take the values as an array and write the data to a csv file named review0.csv, review1.csv... Another solution is to use pd.to_csv within the loop and specify the chunk

Garnishment answered 10/11, 2015 at 0:16 Comment(0)
L
1

Here's another way to do it. This creates a destination folder if it doesn't exist.

import pandas as pd
from pathlib import Path

root_location = Path("/my/root/path")
os.makedirs(root_location, exist_ok=True)
df = pd.read_csv(my_csv) # for example

for index, row in df.iterrows():
    with open(root_location / (str(row["file_name"]) + ".txt"), "w") as f:
        f.write(str(row["file_contents"]))
Limp answered 3/3, 2021 at 0:35 Comment(0)
M
0

this is simpler but might be costly solution

for i in range(len(data_to_txt)): 
    data_to_txt.iloc[[i]].to_csv(str(i)+".txt")
Motteo answered 29/6, 2022 at 17:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.