Converting molecule name to SMILES?

T

6

12

I was just wondering, is there any way to convert IUPAC or common molecular names to SMILES? I want to do this without having to manually convert every single one utilizing online systems. Any input would be much appreciated!

For background, I am currently working with python and RDkit, so I wasn't sure if RDkit could do this and I was just unaware. My current data is in the csv format.

Thank you!

Theurich answered 28/2, 2019 at 16:23 Comment(1)

(Text Munging?) – Berkman 28/2, 2019 at 16:36

R

18

RDKit cant convert names to SMILES. Chemical Identifier Resolver can convert names and other identifiers (like CAS No) and has an API so you can convert with a script.

from urllib.request import urlopen
from urllib.parse import quote

def CIRconvert(ids):
    try:
        url = 'http://cactus.nci.nih.gov/chemical/structure/' + quote(ids) + '/smiles'
        ans = urlopen(url).read().decode('utf8')
        return ans
    except:
        return 'Did not work'

identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']

for ids in identifiers :
    print(ids, CIRconvert(ids))

Output

3-Methylheptane CCCCC(C)CC
Aspirin CC(=O)Oc1ccccc1C(O)=O
Diethylsulfate CCO[S](=O)(=O)OCC
Diethyl sulfate CCO[S](=O)(=O)OCC
50-78-2 CC(=O)Oc1ccccc1C(O)=O
Adamant Did not work

Recess answered 28/2, 2019 at 18:29 Comment(4)

For some reason this website is not operating properly since circa late 2020 – Piccadilly 4/2, 2021 at 6:23

@CodyAldaz The website seems to have some problems, but most of the time, when I click on Submit, it works. However the API works. – Recess 4/2, 2021 at 12:19

this mostly worked for me, but I had to just convert spaces to URL format (%20), such that: current_id = str(ids.lower()).replace(' ', '%20') url = 'cactus.nci.nih.gov/chemical/structure' + current_id + '/smiles' – Furnivall 29/5, 2021 at 3:10

@PaulG Thank you for pointing out the spaces. I have edited the code. – Recess 29/5, 2021 at 5:42

B

4

PubChemPy has some great features that can be used for this purpose. It supports IUPAC systematic names, trade names and all known synonyms for a given Compound as documented in PubChem database: https://pubchempy.readthedocs.io/en/latest/

>>> import pubchempy as pcp
>>> results = pcp.get_compounds('Glucose', 'name')
>>> print results
[Compound(79025), Compound(5793), Compound(64689), Compound(206)]

The first argument is the identifier, and the second argument is the identifier type, which must be one of name, smiles, sdf, inchi, inchikey or formula. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let’s take a look at them in more detail:

>>> for compound in results:
>>>     print compound.isomeric_smiles

C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O
C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O
C(C1C(C(C(C(O1)O)O)O)O)O

It looks like they all have different stereochemistry information !

Bobbinet answered 6/5, 2022 at 18:3 Comment(0)

F

3

OPSIN (https://opsin.ch.cam.ac.uk/) is another solution for name2structure conversion.

It can be used by installing the cli, or via https://github.com/gorgitko/molminer

(OPSIN is used by the RDKit KNIME nodes also)

Fanfare answered 16/3, 2019 at 11:7 Comment(0)

P

0

The accepted answer uses the Chemical Identifier Resolver but for some reason the website seems to be buggy for me and the API seems to be messed up.

So another way to connvert smiles to IUPAC name is with the the PubChem python API, which can work if your smiles is in their database

e.g.

#!/usr/bin/env python

import sys    
import pubchempy as pcp

smiles = str(sys.argv[1])
print(smiles)
s= pcp.get_compounds(smiles,'smiles')
print(s[0].iupac_name)

Piccadilly answered 4/2, 2021 at 23:27 Comment(2)

The question was about converting name to smiles (not other way around). It can be done using this API as well: smiles= pcp.get_compounds(ids,'name')[0].canonical_smiles – Wigging 1/9, 2021 at 12:31

What about we don't have any id and just have name of compound? – Fleeting 18/1, 2022 at 4:48

F

0

You can use batch query of pubchem:

Fleeting answered 18/1, 2022 at 4:58 Comment(0)

F

0

You can use the pubchem API (PUG REST) for this

(https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest-tutorial)

Basically, the url you are calling will take the compound as a "name", you then give the name, then you specify that you want the "property" of "CanonicalSMILES", as text

identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']
smiles_df = pd.DataFrame(columns = ['Name', 'Smiles'])
for x in identifiers :
    try:
        url = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/' + x + '/property/CanonicalSMILES/TXT'
#         remove new line character with rstrip
        smiles = requests.get(url).text.rstrip()
        if('NotFound' in smiles):
            print(x, " not found")
        else: 
            smiles_df = smiles_df.append({'Name' : x, 'Smiles' : smiles}, ignore_index = True)
    except: 
        print("boo ", x)
print(smiles_df)

Furnivall answered 25/5, 2022 at 2:19 Comment(0)

Recommended topics

Hot tags