Is it possible to create an algorithm which generates an autogram?

Asked 1/6, 2014 at 13:35 Answered 26/6, 2014 at 19:21

algorithm language-agnostic computability

An autogram is a sentence which describes the characters it contains, usually enumerating each letter of the alphabet, but possibly also the punctuation it contains. Here is the example given in the wiki page.

This sentence employs two a’s, two c’s, two d’s, twenty-eight e’s, five f’s, three g’s, eight h’s, eleven i’s, three l’s, two m’s, thirteen n’s, nine o’s, two p’s, five r’s, twenty-five s’s, twenty-three t’s, six v’s, ten w’s, two x’s, five y’s, and one z.

Coming up with one is hard, because you don't know how many letters it contains until you finish the sentence. Which is what prompts me to ask: is it possible to write an algorithm which could create an autogram? For example, a given parameter would be the start of the sentence as an input e.g. "This sentence employs", and assuming that it uses the same format as the above "x a's, ... y z's".

I'm not asking for you to actually write an algorithm, although by all means I'd love to see if you know one to exist or want to try and write one; rather I'm curious as to whether the problem is computable in the first place.

Strophanthin answered 1/6, 2014 at 13:35 Comment(8)

I don't think it does. I'm not asking people to write the algorithm, as I specified in the OP, I'm asking whether it's possible. This isn't a challenge, but a question of general computability. – Strophanthin 1/6, 2014 at 13:40

Of interest. – Misprision 1/6, 2014 at 14:7

@NiklasB. There's an upper bound on length. – Misprision 1/6, 2014 at 14:31

@n.m. I realized after reading your reference. I guess one can prove that using the fact that numbers grow exponentially in the size of their written representation. So then of course the problem is decidable via exhaustive search (and in reality probably using exhaustive search with lots of pruning). Maybe there is a reasonably simple ILP formulation that can be fed into modern solvers directly – Sparoid 1/6, 2014 at 14:33

There is an upper limit, but finding one and proving that it's an upper limit is tedious. I tried writing an Answer that involved inventing a language strictly less efficient than English at expressing numbers, and couldn't maintain my interest to the end of the proof; I won't inflict it on the world. – Agripinaagrippa 1/6, 2014 at 14:56

@Unihedron I very strongly doubt it will fit on Code Golf, and even if it does, it's probably better suited here. – Loupe 1/6, 2014 at 15:13

@Dukeling particularly if we're not resolute that a solution even exists yet. – Strophanthin 1/6, 2014 at 16:14

@LeoKing BTW, the easiest part of the problem, which is checking if a string is a autogram or not, is quite simple to do :) – Chansoo 1/6, 2014 at 21:44

You are asking two different questions.

"is it possible to write an algorithm which could create an autogram?"

There are algorithms to find autograms. As far as I know, they use randomization, which means that such an algorithm might find a solution for a given start text, but if it doesn't find one, then this doesn't mean that there isn't one. This takes us to the second question.

"I'm curious as to whether the problem is computable in the first place."

Computable would mean that there is an algorithm which for a given start text either outputs a solution, or states that there isn't one. The above-mentioned algorithms can't do that, and an exhaustive search is not workable. Therefore I'd say that this problem is not computable. However, this is rather of academic interest. In practice, the randomized algorithms work well enough.

Jejunum answered 26/6, 2014 at 19:21 Comment(0)

Let's assume for the moment that all counts are less than or equal to some maximum M, with M < 100. As mentioned in the OP's link, this means that we only need to decide counts for the 16 letters that appear in these number words, as counts for the other 10 letters are already determined by the specified prefix text and can't change.

One property that I think is worth exploiting is the fact that, if we take some (possibly incorrect) solution and rearrange the number-words in it, then the total letter counts don't change. IOW, if we ignore the letters spent "naming themselves" (e.g. the c in two c's) then the total letter counts only depend on the multiset of number-words that are actually present in the sentence. What that means is that instead of having to consider all possible ways of assigning one of M number-words to each of the 16 letters, we can enumerate just the (much smaller) set of all multisets of number-words of size 16 or less, having elements taken from the ground set of number-words of size M, and for each multiset, look to see whether we can fit the 16 letters to its elements in a way that uses each multiset element exactly once.

Note that a multiset of numbers can be uniquely represented as a nondecreasing list of numbers, and this makes them easy to enumerate.

What does it mean for a letter to "fit" a multiset? Suppose we have a multiset W of number-words; this determines total letter counts for each of the 16 letters (for each letter, just sum the counts of that letter across all the number-words in W; also add a count of 1 for the letter "S" for each number-word besides "one", to account for the pluralisation). Call these letter counts f["A"] for the frequency of "A", etc. Pretend we have a function etoi() that operates like C's atoi(), but returns the numeric value of a number-word. (This is just conceptual; of course in practice we would always generate the number-word from the integer value (which we would keep around), and never the other way around.) Then a letter x fits a particular number-word w in W if and only if f[x] + 1 = etoi(w), since writing the letter x itself into the sentence will increase its frequency by 1, thereby making the two sides of the equation equal.

This does not yet address the fact that if more than one letter fits a number-word, only one of them can be assigned it. But it turns out that it is easy to determine whether a given multiset W of number-words, represented as a nondecreasing list of integers, simultaneously fits any set of letters:

Calculate the total letter frequencies f[] that W implies.
Sort these frequencies.
Skip past any zero-frequency letters. Suppose there were k of these.
For each remaining letter, check whether its frequency is equal to one less than the numeric value of the number-word in the corresponding position. I.e. check that f[k] + 1 == etoi(W[0]), f[k+1] + 1 == etoi(W[1]), etc.
If and only if all these frequencies agree, we have a winner!

The above approach is naive in that it assumes that we choose words to put in the multiset from a size M ground set. For M > 20 there is a lot of structure in this set that can be exploited, at the cost of slightly complicating the algorithm. In particular, instead of enumerating straight multisets of this ground set of all allowed numbers, it would be much better to enumerate multisets of {"one", "two", ..., "nineteen", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}, and then allow the "fit detection" step to combine the number-words for multiples of 10 with the single-digit number-words.

Through answered 2/6, 2014 at 4:55 Comment(4)

I found your answer very complicated, but very interesting to read; thank you for exploring the problem in such elaborate detail! Though I can't find in your answer a reference to the question, i.e. whether the problem is computable, in general. Do you think the approach you've described can apply to any prefix text? – Strophanthin 2/6, 2014 at 6:7

@LeoKing: My answer is just a more efficient kind of brute-force search that, like the obvious kind, has to stop at some prescribed limits (here, a maximum of M for each letter), so it might miss a solution if M was chosen too low. – Through 2/6, 2014 at 16:44

@LeoKing: One way to establish computability is to design a function u() that tells us, for any given value of n, an upper bound on the total number of letters in any multiset of up to 26 number-words whose numeric values sum to n; and which has the property that for some k, u(i+k) <= i+k. If we can then find a band of k consecutive values i such that u(i) < i for all of the values in the band then we know that no sentence with i or more letters is representable, so we can set M = i. – Through 2/6, 2014 at 16:46

@LeoKing: I'm certain that such a band must exist, because I'm certain that the rate at which the total letter count of a number-word increases as the number it describes increases is less than 1, although there are many "jumps" that need to be handled as special cases (e.g. "twenty" -> "twenty-one": the number being described only increases by 1, but the letter count increases by 3 > 1) so you need a suitably averaged approach. – Through 2/6, 2014 at 16:49

Recommended topics

Hot tags