Write or log print output of pandas Dataframe
Asked Answered
S

4

9

I have a Dataframe I wish to write a few rows of into a file and logger in Python 2.7. print(dataframe.iloc[0:4]) outputs a nice grid of the column headers and top 4 rows in the dataframe. However logging.info(dataframe.iloc[0:4]) throws:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 87: ordinal not in range(128)

Here is the output to console, works either directly to console or via print() (note the ²):

In[89]: d.iloc[0:4]    OR   print(d.iloc[0:4])
Out[89]: 
   ISO  ID_0     NAME_0  ID_1                           NAME_1    ID_2    NAME_2  Area(km.²)  Pop2001_Cen  Pop2010_Cen  HHold2010  Hhold_Size
0  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires     NaN       NaN       203.0    2776138.0      2890151  1150134.0    2.512882
1  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires  2001.0  Comuna 1         NaN     171975.0       205886    84468.0    2.437444
2  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires  2002.0  Comuna 2         NaN     165494.0       157932    73156.0    2.158839
3  ARG    12  Argentina     2  Ciudad Autónoma de Buenos Aires  2003.0  Comuna 3         NaN     184015.0       187537    80489.0    2.329971

As does file.write(dataframe.iloc[0:4]) and so on, as one of the column headers includes a non-ascii character. I have tried all sorts of variations of decode(), encode(), etc, but cannot avoid this error.

print(d.iloc[0:4]) works, so another approach was to use print(d.iloc[0:4], file=f) but even with from __future__ import print_function I get the above ascii encoding error.

Other ways to replicate this problem are logging.info('Area(km.²)') or 'Area(km.²)'.decode()

How can I render this dataframe?

[Edit:]

I also want to understand fundamentally how I deal with string encoding/decoding in Python 2.7 - I've been hacking away at this for more time than it deserves because this isn't the only time I've had this UnicodeDecodeError error, and I don't know when it'll occur and I am still just throwing fixes at the console to see what sticks, without any underlying understanding of what's going on.

Snobbish answered 28/2, 2017 at 17:47 Comment(1)
May you post an extract of your original dataframe?Varick
V
2

IIUC, you can try to pass encoding='utf-8' when writing out the first n rows of the dataframe with:

df.head(n).to_csv('yourfileout.csv', encoding='utf-8')
Varick answered 28/2, 2017 at 18:5 Comment(6)
That works, but is there a way of doing this in memory, so that I can pass it to a logger and write to a file that contains other text too? Also I was hoping to keep the column right-aligning of the df.__str__ output.Snobbish
The to_csv() command outputs to console if the filename is omitted. I'd still like the formatting and to solve the encoding issue (question updated - sorry Fabio), just noting this here in case it is useful to someone.Snobbish
Actually I cannot reproduce your logging.info issue... maybe this question might be useful to investigate decoding problems with python.Varick
I'm very sorry I've added a couple ways to reproduce the problem much more simply, and the solution in your linked answer, ie, unidecode(unicode('Area(km.²)', encoding = "utf-8")) seems to fix the issue - it returns 'Area(km.2)'Snobbish
I had hoped for some understanding of this issue in Python 2.7, but perhaps things simply are complicated and there's no underlying concept I'm missing?Snobbish
Having hacked at this for many hours, it seems there is no underlying principle, you just have to figure out what kind of string everything returns, and write custom code for each, and this error is not really foreseeable given better theoretical understanding, as even major libraries often return string types inconsistently and without documentation, and worse still, if a nasty character doesn't turn up the bug will go unrevealed, so only solution is test each string operation exhaustively with lists of nasty strings. Or upgrade to Python 3+.Snobbish
F
16

Improving gageorge's answer, Following rendered better when there are more than 5 rows

logging.info('dataframe head - {}'.format(df.to_string()))

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_string.html

Funiculate answered 23/4, 2020 at 21:51 Comment(0)
C
9

With python 3 and the latest pandas, this worked for me ...

logging.info('dataframe head - {}'.format(df.head()))
Counterirritant answered 21/6, 2018 at 15:17 Comment(0)
V
2

IIUC, you can try to pass encoding='utf-8' when writing out the first n rows of the dataframe with:

df.head(n).to_csv('yourfileout.csv', encoding='utf-8')
Varick answered 28/2, 2017 at 18:5 Comment(6)
That works, but is there a way of doing this in memory, so that I can pass it to a logger and write to a file that contains other text too? Also I was hoping to keep the column right-aligning of the df.__str__ output.Snobbish
The to_csv() command outputs to console if the filename is omitted. I'd still like the formatting and to solve the encoding issue (question updated - sorry Fabio), just noting this here in case it is useful to someone.Snobbish
Actually I cannot reproduce your logging.info issue... maybe this question might be useful to investigate decoding problems with python.Varick
I'm very sorry I've added a couple ways to reproduce the problem much more simply, and the solution in your linked answer, ie, unidecode(unicode('Area(km.²)', encoding = "utf-8")) seems to fix the issue - it returns 'Area(km.2)'Snobbish
I had hoped for some understanding of this issue in Python 2.7, but perhaps things simply are complicated and there's no underlying concept I'm missing?Snobbish
Having hacked at this for many hours, it seems there is no underlying principle, you just have to figure out what kind of string everything returns, and write custom code for each, and this error is not really foreseeable given better theoretical understanding, as even major libraries often return string types inconsistently and without documentation, and worse still, if a nasty character doesn't turn up the bug will go unrevealed, so only solution is test each string operation exhaustively with lists of nasty strings. Or upgrade to Python 3+.Snobbish
F
0

For my case the following worked nicely.

[log.debug(_) for _ in df_summary.to_string(index=False).split('\n')]

If I did want the index in my output, then I found that resetting the index to column and continuing to exclude it from the .to_string gave me a prettier output.

df_summary = df_summary.reset_index()
[log.debug(_) for _ in df_summary.to_string(index=False).split('\n')]
Faun answered 12/9, 2024 at 8:50 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.