Python Pandas, write DataFrame to fixed-width file (to_fwf?)
Asked Answered
B

7

25

I see that Pandas has read_fwf, but does it have something like DataFrame.to_fwf? I'm looking for support for field width, numerical precision, and string justification. It seems that DataFrame.to_csv doesn't do this. numpy.savetxt does, but I wouldn't want to do:

numpy.savetxt('myfile.txt', mydataframe.to_records(), fmt='some format')

That just seems wrong. Your ideas are much appreciated.

Bufford answered 10/5, 2013 at 20:23 Comment(5)
take a look at the to_string method so see if you can do what you want.Centillion
This looks close. It seems that I'd have to give a formatter function for each column if any two float or string columns had different formats. It would do the trick, it just looks a little unwieldy. I'd hoped I was missing something. Thanks!Bufford
pandas df.to_csv has a sep =" " parameter that changes the comma to anything else in this case a space or empty string. That in conjunction with the formatter of the method should do it.Nosh
@Nosh Actually with the df.to_csv() method using empty string as delimiter gives the error TypeError: delimiter must be set.Cariole
true.. passing empty string to method would create a mess do ignore my reference to empty string. maybe trying pandas "to_string" method would help. if has formatters parameter that is pretty goodNosh
L
28

Until someone implements this in pandas, you can use the tabulate package:

import pandas as pd
from tabulate import tabulate

def to_fwf(df, fname):
    content = tabulate(df.values.tolist(), list(df.columns), tablefmt="plain")
    open(fname, "w").write(content)

pd.DataFrame.to_fwf = to_fwf
Lengel answered 13/3, 2016 at 19:29 Comment(0)
S
9

For custom format for each column you can set format for whole line. fmt param provides formatting for each line

with open('output.dat') as ofile:
     fmt = '%.0f %02.0f %4.1f %3.0f %4.0f %4.1f %4.0f %4.1f %4.0f'
     np.savetxt(ofile, df.values, fmt=fmt)
Sevigny answered 14/9, 2017 at 16:45 Comment(0)
F
8

pandas.DataFrame.to_string() is all you need. The only trick is how to manage the index.

# Write
# df.reset_index(inplace=True)  # uncomment if the index matters
df.to_string(filepath, index=False)

# Read
df = pd.read_fwf(filepath)
# df.set_index(index_names, inplace=True)  # uncomment if the index matters

If the index is a pandas.Index that has no name, reset_index() should assign it to column "index". If it is a pandas.MultiIndex that has no names, it should be assigned to columns ["level_0", "level_1", ...].

Fard answered 15/9, 2020 at 15:23 Comment(7)
Note that Dataframe.to_string() has no option to remove space between output columns.Whoosh
@JaimeM. I’m sorry but I don’t understand how it relates to the question.Fard
Dataframe.to_string() is not able to write an string output without spaces between columns so is not valid to get a fixed-width file row. For example, when the column length is 1 character and left aligned, to_string() will output: x yyy... but you want xyyy....Whoosh
Oh, okay. I haven’t thought about that use case because it’s very rare. When someone exports a table in CSV, TSV, Excel, fixed-width, or any other format, columns separators are usually expected. It makes the file simpler to read and process. Yet, if all columns have 1 character, we can remove the spaces with df.to_string().replace(" ", ""), and then write that string into a text file.Fard
Fixed-width format doesn't use columns separator. Because each column has fixed width, so you know where is the begining and the end of each column. Padding with white spaces is used between the value and the limit column, so you can use str.replace().Whoosh
Check this answer for an overview of the fixed-widhth format advantages.Whoosh
"Fixed-width format doesn't use columns separator": if you want to visualise a table in FWF, it's easier with spaces, even if all columns have only one character.Fard
T
7

Python, Pandas : write content of DataFrame into text File

The question aboves answer helped me. It is not the best, but until to_fwf exists this will do the trick for me...

np.savetxt(r'c:\data\np.txt', df.values, fmt='%d')

or

np.savetxt(r'c:\data\np.txt', df.values, fmt='%10.5f')
Tapley answered 26/6, 2016 at 7:51 Comment(1)
IMO this is better than tabulate since numpy is included with pandas so doesn't require an additional libraryPremier
A
4

I'm sure you found a workaround for this issue but for anyone else who is curious... If you write the DF into a list, you can write it out to a file by giving the 'format as a string'.format(list indices) eg:

df=df.fillna('')
outF = 'output.txt'      
dbOut = open(temp, 'w')
v = df.values.T.tolist()        
for i in range(0,dfRows):       
    dbOut.write(( \
    '{:7.2f}{:>6.2f}{:>2.0f}{:>4.0f}{:>5.0f}{:6.2f}{:6.2f}{:6.2f}{:6.1f {:>15}{:>60}'\
    .format(v[0][i],v[1][i],v[2][i],v[3][i],v[4][i],v[5][i],v[6][i],v[7][i],v[8][i],\
    v[9][i],v[10][i]) ))
    dbOut.write("\n")
dbOut.close

Just make sure to match up each index with the correct format :)

Hope that helps!

Annaleeannaliese answered 30/7, 2015 at 18:14 Comment(0)
B
1

found a very simple solution! (Python). In the code snapped I am trying to write a DataFrame to a positional File. "finalDataFrame.values.tolist()" will return u a list in which each row of the DataFrame is turn into an another list just a [['Camry',2019,'Toyota'],['Mustang','2016','Ford']]. after that with the help of for loop and if statement i am trying to set its fix length. rest is obvious!

 with open (FilePath,'w') as f:
    for i in finalDataFrame.values.tolist():
        widths=(0,0,0,0,0,0,0)
        if i[2] == 'nan':
            i[2]=''
            for h in range(7):
                i[2]= i[2] + ' '
        else:
            x=7-len(str(i[2]))
            a=''
            for k in range(x):
               a=a+' '
            i[2]=str(i[2])+a

        if i[3] == '':
            i[3]=''
            for h in range(25):
                i[3]=i[3]+' '
        else:
            x = 25 - len(i[3])
            print(x)
            a = ''
            for k in range(x):
                a = a + ' '
            print(a)
            i[3] = i[3] + a


        i[4] = str(i[4])[:10]

        q="".join("%*s" % i for i in zip(widths, i))
        f.write(q+'\n')
Ballenger answered 22/2, 2019 at 20:3 Comment(0)
F
0

Based on others' answer, here is the snippet I wrote, not the best in coding and performance:

import pandas as pd
import pickle
import numpy as np
from tabulate import tabulate


left_align_gen = lambda length, value: eval(r"'{:<<<length>>}'.format('''<<value>>'''[0:<<length>>])".replace('<<length>>', str(length)).replace('<<value>>', str(value)))
right_align_gen = lambda length, value: eval(r"'{:><<length>>}'.format('''<<value>>'''[0:<<length>>])".replace('<<length>>', str(length)).replace('<<value>>', str(value)))

# df = pd.read_pickle("dummy.pkl")
with open("df.pkl", 'rb') as f:
    df = pickle.load(f)

# field width defines here, width of each field
widths=(22, 255, 14, 255, 14, 255, 255, 255, 255, 255, 255, 22, 255, 22, 255, 255, 255, 22, 14, 14, 255, 255, 255, 2, )

# format datetime
df['CREATED_DATE'] = df['CREATED_DATE'].apply(lambda x: x.to_pydatetime().strftime('%Y%m%d%H%M%S'))
df['LAST_MODIFIED_DATE'] = df['LAST_MODIFIED_DATE'].apply(lambda x: x.to_pydatetime().strftime('%Y%m%d%H%M%S'))
df['TERMS_ACCEPTED_DATE'] = df['TERMS_ACCEPTED_DATE'].apply(lambda x: x.to_pydatetime().strftime('%Y%m%d%H%M%S'))
df['PRIVACY_ACCEPTED_DATE'] = df['PRIVACY_ACCEPTED_DATE'].apply(lambda x: x.to_pydatetime().strftime('%Y%m%d%H%M%S'))


# print(type(df.iloc[0]['CREATED_DATE']))
# print(df.iloc[0])
record_line_list = []
# for row in df.iloc[:10].itertuples():
for row in [tuple(x) for x in df.to_records(index=False)]:
    record_line_list.append("".join(left_align_gen(length, value) for length, value in zip(widths, row)))

with open('output.txt', 'w') as f:
    f.write('\n'.join(record_line_list))

Github gist

Footy answered 25/9, 2020 at 2:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.