A theoretical idea for a captcha filter. Ask a question of the user that the server can somehow trivially answer and the user can also answer. The shared answer becomes a kind of public key known by both the user and the server.
A Stack Overflow related example:
How many reputation points does user XYZ have?
Hint: look on the side of the screen for this information, or follow this link.
The user could be randomly pulled from known stack overflow users.
A more generic example:
Where do you live?
What were the weather conditions at 9:00 on Saturday where you live?
Hint: Use yahoo weather and provide humidity and general conditions.
Then the user enters their answer
Seattle
Partly cloudy, 85% humidity
The computer confirms that it was indeed those weather conditions in Seattle at that time.
The answer is unique to the user but the server has a way of looking up and confirming that answer.
The types of questions could be varied. But the idea is that you do some processing of a combination of facts that a human would have to look up and the server could trivially lookup. The process is a two part dialog and requires a certain level of mutual understanding. It is kind of a reverse turning test. Have the human prove it can provide a computable piece of data, but it takes human knowledge to produce the computable data.
Another possible implementation. What is your name and when were you born?
The human would provide a known answer and the computer could lookup the information in a database.
Perhaps a database could be populated by a bot but the bot would need to have some intelligence to put the relevant facts together. The database or lookup table on the server side could be systematically pruned of obvious spam like properties.
I am sure that there are flaws and details to be worked out in the implementation. But the concept seems sound. The user provides a combination of facts that the server can lookup, but the server has control over the kind of combinations that should be asked. The combinations could be randomized and the server could use a variety of strategies to lookup the shared answer. The real benefit is that you are asking the user to provide some sort of profiling and revelation of themselves in their answer. This makes it all the more difficult for bots to be systematic. A bunch of computers start using the same answers across many servers and captcha forms such as
I am Robot born 1972 at 3:45 pm.
Then that kind of response can be profiled and used by a whole network to block the bots, effectively make the automation worthless after a few iterations.
As I think about this more it would be interesting to implement a basic reading comprehension test for commenting on blog posts. After the end of a blog post the writer could pose a question to his or her readers. The question could be unique to each blog post and it would have the added benefit of requiring users to actually read before commenting. One could write the simple question at the end of a post with answers stored server side and then have an array of non sense questions to salt the database.
Did this post talk about purple captcha technology?
Server side answer (false, no)
Was this a post about captchas?
Server side answer (true, yes)
Was this a post about Michael Jackson?
Server side answer (false, no)
It seems useful to have several questions presented in random order and make the order significant. e.g. the above would = no, yes, no. Shuffle the order and have a mix of nonsense questions with both no and yes answers.