Python 3 solutions.
In my work, the annoyed part is that the amino acid codes can refer to the modified ones which often appear in the PDB/mmCIF files, like
'Tih'-->'A'.
So the mapping can be more than 22 pairs. The 3rd party tools in Python like
Bio.SeqUtils.IUPACData.protein_letters_3to1
cannot handle it. My easiest solution is to use the http://www.ebi.ac.uk/pdbe-srv/pdbechem to find the mapping and add the unusual mapping to the dict in my own functions whenever I encounter them.
def three_to_one(three_letter_code):
mapping = {'Aba':'A','Ace':'X','Acr':'X','Ala':'A','Aly':'K','Arg':'R','Asn':'N','Asp':'D','Cas':'C',
'Ccs':'C','Cme':'C','Csd':'C','Cso':'C','Csx':'C','Cys':'C','Dal':'A','Dbb':'T','Dbu':'T',
'Dha':'S','Gln':'Q','Glu':'E','Gly':'G','Glz':'G','His':'H','Hse':'S','Ile':'I','Leu':'L',
'Llp':'K','Lys':'K','Men':'N','Met':'M','Mly':'K','Mse':'M','Nh2':'X','Nle':'L','Ocs':'C',
'Pca':'E','Phe':'F','Pro':'P','Ptr':'Y','Sep':'S','Ser':'S','Thr':'T','Tih':'A','Tpo':'T',
'Trp':'W','Tyr':'Y','Unk':'X','Val':'V','Ycm':'C','Sec':'U','Pyl':'O'} # you can add more
return mapping[three_letter_code[0].upper() + three_letter_code[1:].lower()]
The other solution is to retrieve the mapping online (But the url and the html pattern may change through time):
import re
import urllib.request
def three_to_one_online(three_letter_code):
url = "http://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/" + three_letter_code
with urllib.request.urlopen(url) as response:
single_letter_code = re.search('\s*<td\s*>\s*<h3>One-letter code.*</h3>\s*</td>\s*<td>\s*([A-Z])\s*</td>', response.read().decode('utf-8')).group(1)
return single_letter_code
Here I directly use the re instead of the html parsers for the simplicity.
Hope these can help.
ARGHISLEULEULYS
converted toRHLLK
? What is the logic? – Berenice