How to make a python organic chemistry retro-synthesis generator?
Asked Answered
P

2

5

I am trying to learn python by making a simple program which generates a typical type of practice problem, organic chemistry students usually face on exams: the retro-synthesis question.

For those unfamiliar with this type of question: the student is given the initial and final species of a series of chemical reactions, then is asked to determine which reagents/reactions were performed to the initial reactant to obtain the final product.

Sometimes you are only given the final product and asked to list the reactions necessary to synthesize given some parameters (start only with a compound that has 5 carbons or less, only use alcohol, etc.)

So far, I've done some research, and I think RDkit w/Python is a good place to start. My plan is to use the SMILE format for reading molecules (since I can manipulate it as I would a string), then define functions for each reaction, finally I'll need a database of chemical species which the program can randomly select species from (for the inital and final species in the problem). The program then selects a random species from the database, applies a bunch of reactions to it (3-5, specified by the user) then displays the final product. The user then solves the question himself, and the program then shows the path it took (using images of the intermediates and printing the reagents used to obtain them). Simple. In principle.

But once I started actually coding the functions I ran in to some problems, first of all it is very tedious to write a function for every single reaction, second while SMILE can handle virtually all molecular complications thrown at it (stereo-chemistry, geometry, etc.) it has multiple forms for certain molecules and I'm having trouble keeping the reactions specific. Third, I'm using the "replace" method to manipulate the SMILE strings and this gets me into trouble when I have regiospecific reactions that I want to make universal

For example: Sn2 reactions react well with primary alkyl halides, but not all with tertiary ones (steric hinderance), how would I create a function for this reaction?

Another problem, I want the reactions to be tagged by their respective reagents, thus I've taken to naming the functions by the reagents used. But, this becomes problematic when there are reagents which can take many different forms (Gringard reagents for example).

I feel like there is a better, less repetitive and tedious way to tackle this thing. Looking for a nudge in the right direction

Pecuniary answered 17/10, 2014 at 15:41 Comment(3)
This is a very interesting project, but your question is quite broad. I'm not sure that anyone will be able to give you much help unless you have a specific programming problem (other than--this is tedious and hard to implement). Examples of places where you're having difficulty can be questions on their own. I think this undertaking may be a little more difficult than you anticipate.Chammy
@ShihabDider It is polite to give an upvote to folks that post answers that are helpful even if they are not completely correct. Your work is indeed interesting. Have you built anything with Python to get yourself started? Also, I found Klaus's answer useful and I did not ask the question. ;)Linkman
My sincerest apologies to both you and Klaus, this was my first question on SO and I kind of forgot all about it lol. I have upvoted, and selected his answer as the best (it really was in my opinion) This project is on hold while I get more familiar with programming and python and develop the skills necessary to build this.Pecuniary
A
8

That's a pretty ambitious task and you're not the first one who undertook it. Prominent examples were/are

  1. LHASA, originally developed in the group of E.J. Corey at Harvard University

  2. WODCA, developed in the group of J. Gasteiger at Erlangen University

  3. CHIRON, developed in the group of S. Hanessian at the University of Montreal

These projects have seen some man decades of development, but I do not have any reliable information on their current state.

Athalla answered 20/11, 2014 at 6:30 Comment(1)
Would be nice to add the publications about them too.Precursor
M
1

It might be helpful if you will look for a free or if possible with you a commercial software(written in python) which solves the same or a problem close to it, learn its functionality, problem solving approach and if possible obtain its source code. I find this to be helpful in many ways.

Metro answered 1/4, 2015 at 14:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.