Reconstructing now-famous 17-year-old's Markov-chain-based information-retrieval algorithm "Apodora" - McMap

About

Reconstructing now-famous 17-year-old's Markov-chain-based information-retrieval algorithm "Apodora"

Asked 6/8, 2011 at 15:24 Answered 8/8, 2011 at 21:6

Solved nlp machine-learning information-retrieval markov-chains

O

1

7

While we were all twiddling our thumbs, a 17-year-old Canadian boy has apparently found an information retrieval algorithm that:

a) performs with twice the precision of the current, and widely-used vector space model

b) is 'fairly accurate' at identifying similar words.

c) makes microsearch more accurate

Here is a good interview.

Unfortunately, there's no published paper I can find yet, but, from the snatches I remember from the graphical models and machine learning classes I took a few years ago, I think we should be able to reconstruct it from his submision abstract, and what he says about it in interviews.

From interview:

Some searches find words that appear in similar contexts. That’s pretty good, but that’s following the relationships to the first degree. My algorithm tries to follow connections further. Connections that are close are deemed more valuable. In theory, it follows connections to an infinite degree.

And the abstract puts it in context:

A novel information retrieval algorithm called "Apodora" is introduced, using limiting powers of Markov chain-like matrices to determine models for the documents and making contextual statistical inferences about the semantics of words. The system is implemented and compared to the vector space model. Especially when the query is short, the novel algorithm gives results with approximately twice the precision and has interesting applications to microsearch.

I feel like someone who knows about markov-chain-like matrices or information retrieval would immediately be able to realize what he's doing.

So: what is he doing?

Ova answered 6/8, 2011 at 15:24 Comment(6)

hmmm, I wonder how similar this is to singular value decomposition vs hmm or something? Maybe it's a series of consecutive cooccurrence matrices? – Posturize 7/8, 2011 at 3:55

Maybe on a high level, but I doubt you can know what he's doing exactly. He's probably looking at co-occurences of higher order than first degree and modeling this by some kind of Markov process. But there may be many ways to do this. – Skydive 7/8, 2011 at 18:8

Impossible to answer really. The comment "In theory, it follows connections to an infinite degree." suggests to me an eigendecomposition of a term co-occurrence matrix (i.e. something like Latent Semantic Analysis). – Deaconess 7/8, 2011 at 18:37

Many algorithms can have the same abstract. Without a paper, we have only hype. – Egyptology 8/8, 2011 at 13:38

Beyond the algorithmic stuff I have trouble believing such improved performance without better data/improved features, specifically some sort of semantic model. Machine learning is only as good as the data you feed in; ie google translate being literally the entire corpus of multiple language transcripts ever :( – Posturize 8/8, 2011 at 15:53

Exactly, I don't quite believe it either... that's why I want to take a stab at reconstructing it, so I can test it for myself... and (if it works) re-use it in my other research. – Ova 9/8, 2011 at 15:12

M

3

From the use of words like 'context' and the fact that he's introduced a second order level of statistical dependency, I suspect he is doing something related to the LDA-HMM method outlined in the paper: Griffiths, T., Steyvers, M., Blei, D., & Tenenbaum, J. (2005). Integrating topics and syntax. Advances in Neural Information Processing Systems. There are some inherent limits to the resolution of the search due to model averaging. However, I'm envious of doing stuff like this at 17 and I hope to heck he's done something independent and at least incrementally better. Even a different direction on the same topic would be pretty cool.

Mescaline answered 8/8, 2011 at 21:6 Comment(1)

I think you're absolutely right - it really does sound like he's modeling each document as a sequence of words generated by some markov process. Thanks for this spot-on reference! – Ova 9/8, 2011 at 15:9

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.