I am trying to learn python by making a simple program which generates a typical type of practice problem, organic chemistry students usually face on exams: the retro-synthesis question.
For those unfamiliar with this type of question: the student is given the initial and final species of a series of chemical reactions, then is asked to determine which reagents/reactions were performed to the initial reactant to obtain the final product.
Sometimes you are only given the final product and asked to list the reactions necessary to synthesize given some parameters (start only with a compound that has 5 carbons or less, only use alcohol, etc.)
So far, I've done some research, and I think RDkit w/Python is a good place to start. My plan is to use the SMILE format for reading molecules (since I can manipulate it as I would a string), then define functions for each reaction, finally I'll need a database of chemical species which the program can randomly select species from (for the inital and final species in the problem). The program then selects a random species from the database, applies a bunch of reactions to it (3-5, specified by the user) then displays the final product. The user then solves the question himself, and the program then shows the path it took (using images of the intermediates and printing the reagents used to obtain them). Simple. In principle.
But once I started actually coding the functions I ran in to some problems, first of all it is very tedious to write a function for every single reaction, second while SMILE can handle virtually all molecular complications thrown at it (stereo-chemistry, geometry, etc.) it has multiple forms for certain molecules and I'm having trouble keeping the reactions specific. Third, I'm using the "replace" method to manipulate the SMILE strings and this gets me into trouble when I have regiospecific reactions that I want to make universal
For example: Sn2 reactions react well with primary alkyl halides, but not all with tertiary ones (steric hinderance), how would I create a function for this reaction?
Another problem, I want the reactions to be tagged by their respective reagents, thus I've taken to naming the functions by the reagents used. But, this becomes problematic when there are reagents which can take many different forms (Gringard reagents for example).
I feel like there is a better, less repetitive and tedious way to tackle this thing. Looking for a nudge in the right direction