How does one encode and decode a string with Python for use in a URL?
Asked Answered
S

8

5

I have a string like this:

String A: [ 12234_1_Hello'World_34433_22acb_4554344_accCC44 ]

I would like to encrypt String A to be used in a clean URL. something like this:

String B: [ cYdfkeYss4543423sdfHsaaZ ]

Is there a encode API in python, given String A, it returns String B? Is there a decode API in python, given String B, it returns String A?

Smalto answered 17/5, 2009 at 23:45 Comment(2)
Please note, encryption ("I would like to encrypt...") is not encoding ("Is there a encode API..."). You accepted an answer that uses base64 encoding. This is not encryption, i.e. this data is not secure.Baryon
@VN44CA, an encrypted query string and a clean URL are not the same concern. If anything, an encrypted query string is by its nature not clean. As it stands, the question is very poorly phrased.Rosarosabel
C
9

One way of doing the encode/decode is to use the package base64, for an example:

import base64
import sys

encoded = base64.b64encode(sys.stdin.read())
print encoded

decoded = base64.b64decode(encoded)
print decoded

Is it what you were looking for? With your particular case you get:

input: 12234_1_Hello'World_34433_22acb_4554344_accCC44

encoded: MTIyMzRfMV9IZWxsbydXb3JsZF8zNDQzM18yMmFjYl80NTU0MzQ0X2FjY0NDNDQ=

decoded: 12234_1_Hello'World_34433_22acb_4554344_accCC44

Cultigen answered 17/5, 2009 at 23:51 Comment(0)
M
13

note that theres a huge difference between encoding and encryption.

if you want to send sensitive data, then dont use the encoding mentioned above ;)

Mcdermott answered 26/7, 2009 at 10:27 Comment(0)
C
9

One way of doing the encode/decode is to use the package base64, for an example:

import base64
import sys

encoded = base64.b64encode(sys.stdin.read())
print encoded

decoded = base64.b64decode(encoded)
print decoded

Is it what you were looking for? With your particular case you get:

input: 12234_1_Hello'World_34433_22acb_4554344_accCC44

encoded: MTIyMzRfMV9IZWxsbydXb3JsZF8zNDQzM18yMmFjYl80NTU0MzQ0X2FjY0NDNDQ=

decoded: 12234_1_Hello'World_34433_22acb_4554344_accCC44

Cultigen answered 17/5, 2009 at 23:51 Comment(0)
O
6

Are you looking to encrypt the string or encode it to remove illegal characters for urls? If the latter, you can use urllib.quote:

>>> quoted = urllib.quote("12234_1_Hello'World_34433_22acb_4554344_accCC44")
>>> quoted
'12234_1_Hello%27World_34433_22acb_4554344_accCC44'

>>> urllib.unquote(quoted)
"12234_1_Hello'World_34433_22acb_4554344_accCC44"
Obliteration answered 17/5, 2009 at 23:54 Comment(1)
This answer is great because it actually answers the question as asked in the title (the question itself is fairly incoherent), and what lots of searchers looking for this page will want.Gerlach
T
5

Are you after encryption, compression, or just urlencoding? The string can be passed after urlencoding, but that will not make it smaller as in your example. Compression might shrink it, but you would still need to urlencode the result.

Do you actually need to hide the string data from the viewer (e.g. sensitive data, should not be viewable by someone reading the URL over your shoulder)?

Tarr answered 17/5, 2009 at 23:56 Comment(2)
No security issues. I was just thinking of randomizing the string and shrink it a bit. I have seen this on Kijiji.com where they append a random string to the end of their Slug. I am sure that is how they pass arguments around. I thought it would be cool to use stuff like as well. bayarea.kijiji.com/f-Jobs-Construction-trades-W0QQCatIdZ100060 bayarea.kijiji.com/… I'd like to have all the arguments in the URL so if people copy and email URLs, their friends would see the end result without cookies. ThanksSmalto
You might be mistaking storing data with referencing it. That is, the short string you see in the URL is actually a session ID, and the data it references is kept on the server. I could be wrong, I don't know any specifics about that site, but it is a common enough.Tarr
G
5

To make it really short -> just insert a row into the database. Store something like a list of (id auto_increment, url) tuples. Then you can base64 encode the id to get a "proxy url". Decode it by decoding the id and looking up the proper url in the database. Or if you don't mind the identifiers looking sequential, just use the numbers.

Giron answered 18/5, 2009 at 0:41 Comment(0)
A
2

The base64 module provides encoding and decoding for a string to and from different bases, since python 2.4.

In you example, you would do the following:

import base64
string_b = base64.b64encode(string_a)
string_a = base64.b64decode(string_b)

For full API: http://docs.python.org/library/base64.html

Abhorrence answered 17/5, 2009 at 23:54 Comment(2)
I'm getting TypeError: 'str' does not support the buffer interfaceKattie
My string: A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|Adj|Adm|Adv|Asst|Bart|Bldg|Brig|Bros|Capt|Cmdr|Col|Comdr|Con|Corp|Cpl|DR|Dr|Drs|Ens|Gen|Gov|Hon|Hr|Hosp|Insp|Lt|MM|MR|MRS|MS|Maj|Messrs|Mlle|Mme|Mr|Mrs|Ms|Msgr|Op|Ord|Pfc|Ph|Prof|Pvt|Rep|Reps|Res|Rev|Rt|Sen|Sens|Sfc|Sgt|Sr|St|Supt|Surg|v|vs|i.e|rev|e.g|No #NUMERIC_ONLY#|Nos|Art #NUMERIC_ONLY#|Nr|pp #NUMERIC_ONLY#|Jan|Feb|Mar|Apr|Jun|Jul|Aug|Sep|Oct|Nov|DecKattie
U
2

It's hard to reduce the size of a string and preserve arbitrary content.

You have to restrict the data to something you can usefully compress.

Your alternative is to do the following.

  1. Save "all the arguments in the URL" in a database row.

  2. Assign a GUID key to this collection of arguments.

  3. Then provide that shortened GUID key.

Ulick answered 18/5, 2009 at 0:43 Comment(2)
The alternative would require an extra hit to the database right? I see lot of sites doing this, just have to find out how?Smalto
Extra relative to what? If you're authenticating users and checking authorization, you have queries there. If your data model is something other than a trivial table it may involve more than one query. Saving the "query" in the database will relatively easy to manage.Ulick
A
1

Another method that would also shorten the string would be to calculate the md5/sha1 hash of the string (concatenated with a seed if you wished):

import hashlib
>>> hashlib.sha1("12234_1_Hello'World_34433_22acb_4554344_accCC44").hexdigest()
'e1153227558aadc00a2e90b5013fdd6b0804fdfb'

In theory you should get a set of strings with very few collisions and with a fixed length. The hashlib library has an array of different hash functions you can use in this manner, with different output sizes.

Edit: You also said that you needed a reversible string, so this wouldn't work for that. Afaik, however, many web platforms that use clean URLs like you seem to want to implement use hash functions to calculate a shortened URL and then store that URL along with the page's other data to provide the reverse lookup capability.

Autocorrelation answered 18/5, 2009 at 3:36 Comment(1)
Is hashlib.sha1 reversible?Kattie

© 2022 - 2024 — McMap. All rights reserved.