I'm working with a fairly simple parser in PLY, and one of my rules takes on the following form:
def p_things(p):
'''
things : thing things
things : thing
'''
p[0] = [p[1]]
if len(p) == 3:
p[0] += p[2]
Input files are generally simple lists of thing
s, and so the parsing itself is not complex. Some of my input files are very large, however (exceeding 100,000 lines fairly regularly, and over 1,000,000 in extreme cases). In profiling (via cProfile and pstats), the bulk of the runtime is taken up by repeated calls to p_things
- presumably, one call for each item in a things
list.
Is there a way to reduce this time, or a more efficient way to structure this rule? Most answers I've seen so far (and the canonical compilers info I've found) have listed this method as the generally accepted way to construct a list of parseable items, no matter the length.
p[0] += p[2]
runs linearly to the length of the list every time it adds a newthing
, so parsing a list takes quadratic time, whereas in your new method, adding a new elementp[0].append(p[2])
runs amortized constant time, so overall, parsing the list takes linear time. – Lambent