What I want to do: extract text information from a pdf file and redirect that to a txt file.
What I did:
pip install pdfminor
pdf2txt.py file.pdf > output.txt
What I got:
UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 0: illegal multibyte sequence
My observation:
\u2022
is bullet point, •
.
pdf2txt.py
works well without redirection: the bullet point character is written to stdout without any error.
My question:
Why does redirection cause a python error? As far as I know, redirection is a O.S. job, and it is simply copying things after the program is finished.
How can I fix this error? I cannot do any modification to pdf2txt.py
as it's not my code.
print()
– Carawaypython redirect utf-8
I found UnicodeDecodeError when redirecting to file on stackoverflow. Use Google to find more. – Caraway