I have created a function that reads lists of ID pairs (i.e. [("A","B"),("B","C"),("C","D"),...] and sequences the ID's from start to finish including any branches.
Each list of ordered ID's is held in a class called an Alignment and this function uses recursion to handle branches by creating a new alignment starting at the ID at which the branch splits from the main list.
I have found that with certain inputs it is possible to reach the maximum recursion limit set by Python. I know I could just increase this limit using sys.setrecursionlimit(), but as I do not know how many combinations of branches are possible, I'd like to avoid this tactic.
I have been reading several articles about converting recursive functions to iterative functions, but I have not been able to determine the best way to handle this particular function because the recursion takes place in the middle of the function and can be exponential.
Could any of you offer any suggestions?
Thanks, Brian
Code is posted below:
def buildAlignments(alignment, alignmentList, endIDs):
while alignment.start in endIDs:
#If endID only has one preceding ID: add preceding ID to alignment
if len(endIDs[alignment.start]) == 1:
alignment.add(endIDs[alignment.start][0])
else:
#List to hold all branches that end at spanEnd
branches = []
for each in endIDs[alignment.start]:
#New alignment for each branch
al = Alignment(each)
#Recursively process each new alignment
buildAlignments(al, branches, endIDs)
branches.append(al)
count = len(branches)
i = 0
index = 0
#Loop through branches by length
for branch in branches:
if i < count - 1:
#Create copy of original alignment and add branch to alignment
al = Alignment(alignment)
al += branch #branches[index]
alignmentList.append(al)
i += 1
#Add single branch to existing original alignment
else: alignment += branch #branches[index]
index += 1
def main():
IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]
#Gather all startIDs with corresponding endIDs and vice versa
startIDs = {}
endIDs = {}
for pair in IDs:
if not pair[0] in startIDs: startIDs[pair[0]] = []
startIDs[pair[0]].append(pair[1])
if not pair[1] in endIDs: endIDs[pair[1]] = []
endIDs[pair[1]].append(pair[0])
#Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
alignments = [Alignment(end) for end in endIDs if not end in startIDs]
#Build build sequences in each original Alignment
i = len(alignments)
while i:
buildAlignments(alignments[i-1], alignments, endIDs)
i -= 1
EDIT: I should point out that the provided IDs are just a small sample I used for testing this algorithm. In actuality, the sequences of IDs may be several thousand long with many branches and branches off of branches.
RESOLUTION: Thanks to Andrew Cooke. The new method seems to be much simpler and much easier on the call stack. I did make some minor adjustments to his code to better suit my purposes. I have included the completed solution below:
from collections import defaultdict
def expand(line, have_successors, known):
#print line
known.append(line)
for child in have_successors[line[-1]]:
newline = line + [child]
if line in known: known.remove(line)
yield expand(newline, have_successors, known)
def trampoline(generator):
stack = [generator]
while stack:
try:
generator = stack.pop()
child = next(generator)
stack.append(generator)
stack.append(child)
except StopIteration:
pass
def main(pairs):
have_successors = defaultdict(lambda: set())
links = set()
for (start, end) in pairs:
links.add(end)
have_successors[start].add(end)
known = []
for node in set(have_successors.keys()):
if node not in links:
trampoline(expand([node], have_successors, known))
for line in known:
print line
if __name__ == '__main__':
main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])
SUMMARY OF CHANGES:
swapped links and have_successors to create list from start to end
added if line in known: known.remove(line)
to expand in order to retain only the complete series
changed line variable from string to list in order to handle multiple characters in a single ID.
UPDATE: So I just discovered the reason I was having an issue with all this in the first place is do to circular references in the list of IDs I was provided. Now that the circular reference is fixed, either method works as expected. - Thanks again for all your help.
i = len(alignments)
...buildAlignments(alignments[i]
before it even starts recursing. – Kathiekathleenknown
is currently a list. that means that bothin
andremove
take O(n) time (so as the list gets longer they take longer too). it would be more efficient to makeknown
a set, because testing, adding and removing from a set are constant time, no matter how big. the reason i didn't use a set originally is because i started by making a list rather than a string, and you can't use lists with sets. when i changed to a string i should have gone back and fixed this, sorry. – Chrysa