Why do we need to encode and decode in python?
Asked Answered
A

2

5

What is the use case of encode/decode?

My understanding was that encode is used to convert string into byte string in order to be able to pass non ascii data across the program. And decode was to convert this byte string back into string.

But foll. examples shows non acsii characters getting successfully printed even if not encoded/decoded. Example:

val1="À È Ì Ò Ù Ỳ Ǹ Ẁ"
val2 = val1
print('val1 is: ',val2)

encoded_val1=val1.encode()
print('encoded_val1 is: ',encoded_val1)

decoded_encoded_val1=encoded_val1.decode()
print('decoded_encoded_val1 is: ',decoded_encoded_val1)

Output:

enter image description here

So what is the use case of encode and decode in python?

Applejack answered 1/10, 2019 at 9:47 Comment(2)
Please give some context, to what you are trying to do so we can help you better. Encoding your Unicode string creates a byte-string of your variable. You do not have to encode your string. This depends on what you are trying to do.Thomajan
What is the use case of encode and decode?Applejack
P
9

The environment you are working on may support those characters, in addition to that your terminal(or whatever you use to see output) may support displaying those characters. Some terminals/command lines or text editors may not support them. Apart from displaying issues, here are some actual reasons and examples:

1- When you transfer data over internet/network (eg with a socket), information is transferred as raw bytes. Non-ascii characters can not be represented by a single byte so we need a special representation for them (utf-16 or utf-8 with more than one byte). This is the most common reason I encountered.

2- Some text editors only supports utf-8. For example you need to represent your character in utf-8 format in order to work with them. Reason for that is when dealing with text, people mostly used ASCII characters, which are just one byte. When some systems needed to be integrated with non-ascii characters people converted them to utf-8. Some people with more in-depth knowledge about text editors may give a better explanation about this point.

3- You may have a text written with unicode characters with some Chinese/Russian letters in it, and for some reason store it in your remote Linux server. But your server does not support letters from those languages. You need to convert your text to some strict format (utf-8 or utf-16) and store it in your server so you can recover them later.

Here is a little explanation of UTF-8 format. There are also other articles about the topic if you are interested.

Pleistocene answered 1/10, 2019 at 10:17 Comment(0)
A
0

Use utf-8 encoding because it's universal. Set your code editor to utf-8 encoding and put at the top of all your python file:
# coding: utf8
When you get an input (file, string...), it can have a different encoding then you have to get his encode type and decode it. Exemple in HTML file encode type is in meta balise. If you change something in the HTML file and want to save it or send it by network, then you have to encode it in the encode type it was juste before.

Always use unicode for your string in python. (Automatic for python 3 but for python2.7 use the prefix u like u'Hi')

$ python2.7
Python 2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type('this is a string') # bits => encoded
<type 'str'>
>>> type(u'this is a string') # unicode => decoded
<type 'unicode'>

$ python3
Python 3.2.3 (default, Oct 19 2012, 20:10:41) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type("this is a string") # unicode => decoded
<class 'str'>
>>> type(b"this is a string") # bits => encoded

<class 'bytes'>



1 Use UTF8. Now. All over.

2 In your code, specify the file encoding and declare your strings as "unicode".

3 At the entrance, know the encoding of your data, and decode with decode ().

4 At the output, encode in the expected encoding by the system which will receive the data, or if you can not know it, in UTF8, with encode ().

Academy answered 1/10, 2019 at 10:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.