Django dumpdata UTF-8 (Unicode)
Asked Answered
S

14

32

Is there a easy way to dump UTF-8 data from a database?

I know this command:

manage.py dumpdata > mydata.json

But the data I got in the file mydata.json, Unicode data looks like:

"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"

I would like to see a real Unicode string like 全球卫星定位系统 (Chinese).

Semilunar answered 26/1, 2010 at 4:18 Comment(0)
T
12

django-admin.py dumpdata yourapp could dump for that purpose.

Or if you use MySQL, you could use the mysqldump command to dump the whole database.

And this thread has many ways to dump data, including manual methods.

UPDATE: because OP edited the question.

To convert from JSON encoding string to human readable string you could use this:

open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))
Theodolite answered 26/1, 2010 at 4:27 Comment(3)
thanks, i know this command, but the data i got in the file mydata.json , unicode data looks like "name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8" I would like to see real unicode string like '全球卫星定位系统'(Chinese)Semilunar
Added some codes to convert that. I am not sure built-in dumpdata function can do it or not.Theodolite
AttributeError: 'str' object has no attribute 'decode'Pentaprism
D
19

This solution worked for me from @Julian Polard's post.

Basically just add -Xutf8 in front of py or python when running this command:

python -Xutf8 manage.py dumpdata > data.json

Please upvote his answer as well if this worked for you ^_^

Digestible answered 5/2, 2022 at 13:22 Comment(1)
Clean and working. Beautiful!Fermat
I
18

After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.

manage.py dumpdata --format=xml > output.xml

I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.

Hope this will help someone, as I've landed at this thread when looking for a solution..

Ibbie answered 29/10, 2011 at 14:41 Comment(2)
Same error with xml django.db.utils.OperationalError: Problem installing fixture '/app/tours/fixtures/tours.xml': Could not load tours.Tour(pk=06541d20-a873-11e9-b91d-5b320e2b2922): (1366, "Incorrect string value: '\\xCC\\x88kull...' for column 'description' at row 1") Pentaprism
Yeah this totally didn't work on mine. I'm missing the é character.Diffractometer
T
12

django-admin.py dumpdata yourapp could dump for that purpose.

Or if you use MySQL, you could use the mysqldump command to dump the whole database.

And this thread has many ways to dump data, including manual methods.

UPDATE: because OP edited the question.

To convert from JSON encoding string to human readable string you could use this:

open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))
Theodolite answered 26/1, 2010 at 4:27 Comment(3)
thanks, i know this command, but the data i got in the file mydata.json , unicode data looks like "name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8" I would like to see real unicode string like '全球卫星定位系统'(Chinese)Semilunar
Added some codes to convert that. I am not sure built-in dumpdata function can do it or not.Theodolite
AttributeError: 'str' object has no attribute 'decode'Pentaprism
B
6

You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.

Bungalow answered 26/1, 2010 at 4:48 Comment(1)
i dont understand why this option is not made availableStav
P
5

Here I wrote a snippet for that. Works for me!

Postal answered 12/11, 2010 at 13:43 Comment(0)
U
3

You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:

# serfializers/json_no_uescape.py
from django.core.serializers.json import *


class Serializer(Serializer):

    def _init_options(self):
        super(Serializer, self)._init_options()
        self.json_kwargs['ensure_ascii'] = False

Then register new serializer (for example in your app __init__.py file):

from django.core.serializers import register_serializer

register_serializer('json-no-uescape', 'serializers.json_no_uescape')

Then you can run:

manage.py dumpdata --format=json-no-uescape > output.json

Unlookedfor answered 19/4, 2019 at 10:52 Comment(0)
S
2

As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:

open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))

Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.

Strachey answered 16/1, 2020 at 12:35 Comment(0)
C
1

I'm usually add next strings in my Makefile:

.PONY: dump

# make APP=core MODEL=Schema dump
dump:
    @python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
    python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
    > ${APP}/fixtures/${MODEL}.json

It's ok for standard django project structure, fix if your project structure is different.

Camey answered 29/8, 2019 at 15:20 Comment(0)
M
1

This problem has been fixed for both JSON and YAML in Django 3.1.

Maintenon answered 7/7, 2020 at 6:22 Comment(0)
R
1

here's a new solution.

I just shared a repo on github: django-dump-load-utf8.

However, I think this is a bug of django, and hope someone can merge my project to django.

A not bad solution, but I think fix the bug in django would be better.

manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json
Rectangular answered 30/9, 2021 at 12:10 Comment(0)
O
0
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)
Oshiro answered 26/4, 2015 at 10:25 Comment(0)
O
0

I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:

manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)

In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.

I hope it helps :)

Otilia answered 19/4, 2020 at 8:46 Comment(0)
U
0

Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python. djangoproject.com

Unquestionable answered 26/2, 2021 at 14:59 Comment(0)
M
0

In 2023, I still had a rough time with this. I had to follow @wertartem's suggestion and then Change the file encoding of the outputted file to get it to work. It seems the "-Xutf8" tag wasn't necessary for me, but someone reading this might need to follow all 3 steps.

I also had a smaller issue I solved by excluding the admin.logentry from the export (added these tags "-e auth -e contenttypes -e auth.Permission -e admin.logentry")

My full process:

  1. For proper encoding, at least for Windows, make sure utf-8 for worldwide language support is enabled. To do this, (at least for Windows 11) go to "Time & Language" > "Language & Region". Under "Related Settings", click "Administrative Language Settings". Click "Change System Locale". Check the box for "Beta: Use Unicode UTF-8 for worldwide language support". Restart the computer. Once enabled, skip this step for future exports.
  2. Run this command in terminal (here, I'm exporting to a subdirectory and excluding several apps and models from the export): python -Xutf8 manage.py dumpdata --format=json --natural-foreign --natural-primary -e auth -e contenttypes -e auth.Permission -e admin.logentry > databases/seeds/dump.json
  3. Open this "dump.json" file and run the vscode command "Change File Encoding" to save with UTF-8 encoding. If vscode crashes, this can be done in sublime text instead by opening the file and saving with encoding from the file menu.
  4. Change connection to the new database.
  5. python manage.py reset_db
  6. python manage.py migrate
  7. python manage.py loaddata "databases/seeds/dump.json"

Your step 2 command may desire (but not require) slight modification. Check out this: https://docs.djangoproject.com/en/4.2/ref/django-admin/#dumpdata

Merger answered 21/11, 2023 at 8:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.