[EDIT: This post is basically just a full explanation of a DP algorithm mentioned by Niklas B. in comments on other posts.]
Testing a particular word in linear time
In a word that can be formed from chemical element names, at least one of the following must be true:
- The last letter is a single-letter chemical element name AND the prefix consisting of all letters except for the last is itself a word that can be formed from chemical element names.
- The last two letters are a two-letter chemical element name AND the prefix consisting of all letters except for the last two is itself a word that can be formed from chemical element names.
This suggests a nice DP algorithm, where for a given word w of length k we compute, for all 1 <= i <= k, the largest number of letters (or chemical symbols; the question is ambiguous regarding exactly what we're trying to maximise!) that the length-i prefix of w can be built from. Let's call this f(i), and say that f(i) = -1 if the length-i prefix of w cannot be built from chemical element names at all.
Let X = 1 for maximising element names, or 2 for maximising letter counts.
The recursion for the DP can be written as:
- one(i) = if (w[i] is a 1-letter element name) { f(i-1) + 1 } else { -1 }
- two(i) = if ("w[i-1]w[i]" is a 2-letter element name) { f(i-2) + X } else { -1 }
- f(i) = -1 if i < 0
- f(0) = 0
- f(i) = max(one(i), two(i)) if i > 0
Then f(k) will be what we want to know: the maximum number of (letters/elements) that w can be formed from, or -1 if this is not possible.
The max() means that if only one of the ways of handling the end of the prefix works, that way will be chosen (because the other way will have a score of -1, which will always compare less), and if neither way works, we will correctly get f(i) = -1.
Assuming constant-time testing of whether single characters or character pairs are valid chemical element names (using e.g. array or hashtable lookups), we can compute whether a given word can be represented as a sequence of chemical element names in time and space proportional to its length.
Considering all words
If we are maximising the number of letters, then it probably makes the most sense to process the dictionary in decreasing length order, since the first time we encounter a word for which f(k) is not -1, we know it must be the longest and we can stop.
This can be adapted for maximising the element name count. Although we can't stop immediately, because it might be that there is a shorter word further on that nevertheless can be formed from more element names (specifically, by using more single-character ones), we still get some useful information whenever we find a new word with a greater element count than the previously best: we can update a cutoff threshold with this element count, and stop as soon as we see a word that has fewer letters than this threshold, since a length-m word can never be formed from more than m element names.
There is different approach, which might turn out to be faster: by first sorting the dictionary in alphabetical order, we can avoid recomputing f(i) on the prefix that is shared with the previous word. E.g. if carton
immediately follows cart
in the dictionary, then we only need to compute f(i) on the on
part.