CSV file with Arabic characters is displayed as symbols in Excel
Asked Answered
B

9

22

I am using python to extract Arabic tweets from twitter and save it as a CSV file, but when I open the saved file in excel the Arabic language displays as symbols. However, inside python, notepad, or word, it looks good.

May I know where is the problem?

Bantam answered 15/2, 2020 at 13:13 Comment(3)
Tell Excel to open it with correct encodingVip
Exporting CSV from Python(or any other language, I guess) gives a UTF-8 formatted file. But Excel tries to read in a different format ISO-8859-1. So the opening format must be manually selected. To avoid this, use some Excel-specific libs such as XLSX.Armageddon
It could help if you provided some details. What version of python/excel are you using? What operating system? Could you provide the code you are using?Typescript
D
50

This is a problem I face frequently with Microsoft Excel when opening CSV files that contain Arabic characters. Try the following workaround that I tested on latest versions of Microsoft Excel on both Windows and MacOS:

  1. Open Excel on a blank workbook

  2. Within the Data tab, click on From Text button (if not activated, make sure an empty cell is selected)

  3. Browse and select the CSV file

  4. In the Text Import Wizard, change the File_origin to "Unicode (UTF-8)"

  5. Go next and from the Delimiters, select the delimiter used in your file e.g. comma

  6. Finish and select where to import the data

The Arabic characters should show correctly.

Dutra answered 15/2, 2020 at 21:19 Comment(4)
tnx, it works but, now there is another problem all the text is in one column. :(Bantam
that means the appropriate delimiter was not selected. Check step #5 and select the delimiter used in your file e.g comma, semicolon, tab, ... etcDutra
For old MS Excel, there is no option to customize file options when saving as "CSV". Simply, the solution is to save the "XLSX" file as "TXT (Unicode UTF-8)". The generated file will be "TAB" separator. If you want it comma separated, open the file in Notepads (open large files in VS Code) and replace "TAB" occurrences with "," and then save as "CSV".Nasion
How relevant your comment is? The question was about opening CSV files not saving them. Good comment though. Thanks!Dutra
W
17

Just use encoding='utf-8-sig' instead of encoding='utf-8' as follows:

import csv

data = u"اردو"

with(open('example.csv', 'w', encoding='utf-8-sig')) as fh:
    writer = csv.writer(fh)
    writer.writerow([data])

It worked on my machine.

Wurster answered 25/10, 2020 at 21:5 Comment(0)
J
5

Fastest way is after saving the file into .csv from python:

  1. open the .csv file using Notepad++
  2. from Encoding drop-down menu choose UTF-8-BOM
  3. click save as and save at with same name with .csv extension (e.g. data.csv) and keep the file type as it is .txt
  4. re-open the file again with Microsoft Excel.
Jugum answered 16/9, 2022 at 12:14 Comment(0)
N
2

Excel is known to have an awful csv import sytem. Long story short if on same system you import a csv file that you have just exported, it will work smoothly. Else, the csv file is expected to use the Windows system encoding and delimiter.

A rather awkward but robust system is to use LibreOffice or Oracle OpenOffice. Both are far beyond Excel on any feature but the csv module: they will allow you to specify the delimiters and optional quoting characters along with the encoding of the csv file and you will be able to save the resulting file in xslx.

Narwhal answered 15/2, 2020 at 13:47 Comment(1)
Thank you! using LibreOffice worked like charm. didt expect MS Excel and WPS spreadsheet to fail me and all above answers did help except this. Salute to you for Saving my day!Refection
K
1

The only solution that i've found to save arabic into an excel file from python is to use pandas and to save into the xlsx extension instead of csv, xlsx seems a million times better here's the code i've put together which worked for me

import pandas as pd
def turn_into_csv(data, csver):
    ids = []
    texts = []
    for each in data:
        texts.append(each["full_text"])
        ids.append(str(each["id"]))

    df = pd.DataFrame({'ID': ids, 'FULL_TEXT': texts})
    writer = pd.ExcelWriter(csver + '.xlsx', engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1', encoding="utf-8-sig")

    # Close the Pandas Excel writer and output the Excel file.
    writer.save()
Kala answered 22/7, 2020 at 12:15 Comment(0)
P
1

Although my CSV file encoding was UTF-8; but explicitly redoing it again using the Notepad resolved it.

Steps:

  • Open your CSV file in Notepad.
  • Click File --> Save as...
  • In the "Encoding" drop-down, select UTF-8 with BOM.
  • Rename your file using the .csv extension.
  • Click Save.
  • Reopen the saved CSV file with Excel.

enter image description here

Pneumatic answered 30/6, 2022 at 20:14 Comment(0)
M
1

I'm not sure it's the solution, but it might help.

enter image description here

Megasporangium answered 17/5, 2023 at 8:11 Comment(0)
B
1

If you are using pandas and saving the file data as CSV, you might need to tell the executable (eg:- MS Excel) to open it in UTF-8 but to do so there is a trick.

Instead of doing:-

      encoding='utf-8'

Do this:-

    encoding='utf-8-sig'

Using this method will save the file the file with proper Arabics and your executable app like MS Excel will be able to open it correctly.

Note:- For safe sides, if you are reading Arabics data from xlsx or cab and then saving it somewhere, then make sure to use encoding='utf-8-sig' with pandas.

HOPE IT SOLVES YOUR ISSUE !!

Happy Coding Guys. PEACE✌️

Bennybenoit answered 1/10, 2023 at 15:5 Comment(1)
Where do you specify this?Steve
E
0
import csv

# Read the existing content of the CSV file
existing_data = []
try:
    with open('file.csv', 'r', encoding='utf-8-sig') as fh:
        reader = csv.reader(fh)
        existing_data = [row for row in reader]
except FileNotFoundError:
    pass

# Write the existing data back to the CSV file
with open('file.csv', 'w', newline='', encoding='utf-8-sig') as fh:
    writer = csv.writer(fh)
    writer.writerows(existing_data)
Equerry answered 6/8, 2024 at 14:14 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.