n-gram Questions
1
Solved
A Markov chain is composed of a set of states which can transition to other states with a certain probability.
A Markov chain can be easily represented in Neo4J by creating a node for each state, ...
Fidelia asked 17/5, 2013 at 4:7
1
Solved
I have a list of documents that have already been tokenized:
dat <- list(c("texaco", "canada", "lowered", "contract", "price", "pay",
"crude", "oil", "canadian", "cts", "barrel", "effective", ...
Coatee asked 10/5, 2013 at 19:44
2
Solved
I have a code that uses a cyclic polynomial rolling hash (Buzhash) to compute hash values of n-grams of source code. If i use small hash values (7-8 bits) then there are some collisions i.e. differ...
Impeller asked 3/5, 2013 at 18:38
2
Solved
When i use an analyzer with edgengram (min=3, max=7, front) + term_vector=with_positions_offsets
With document having text = "CouchDB"
When i search for "couc"
My highlight is on "cou" and not "...
Newburg asked 3/7, 2012 at 2:19
2
Solved
I've seen tons of documentation all over the web about how the python NLTK makes it easy to compute bigrams of words.
What about letters?
What I want to do is plug in a dictionary and have it tel...
1
Solved
Im new to python and need help!
i was practicing with python NLTK text classification.
Here is the code example i am practicing on
http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-...
1
I'm working on auto completion search with Solr using EdgeNGrams. If the user is searching for names of employees, then auto completion should be applied. That is, I want the results to be like a G...
Clementclementas asked 1/8, 2012 at 4:38
2
Solved
I have been playing around with ElasticSearch for a new project of mine. I have set the default analyzers to use the ngram tokenfilter. This is my elasticsearch.yml file:
index:
analysis:
analyze...
Slump asked 18/2, 2011 at 17:43
3
I'm working on deduping a database of people. For a first pass, I'm following a basic 2-step process to avoid an O(n^2) operation over the whole database, as described in the literature. First, I "...
Novak asked 5/4, 2012 at 19:34
3
Solved
I want to use ElasticSearch to search filenames (not the file's content). Therefore I need to find a part of the filename (exact match, no fuzzy search).
Example:
I have files with the following n...
Mccoy asked 23/2, 2012 at 21:10
2
Solved
I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can ...
Foucault asked 21/4, 2011 at 11:24
2
Solved
I'm trying to write an algorithm (which I'm assuming will rely on natural language processing techniques) to 'fill out' a list of search terms. There is probably a name for this kind of thing which...
1
Solved
I have been working on a project about sentence similarity. I know it has been asked many times in SO, but I just want to know if my problem can be accomplished by the method I use by the way that ...
Pouncey asked 27/10, 2010 at 19:59
5
Solved
Let's say I have a sentence of text:
$body = 'the quick brown fox jumps over the lazy dog';
and I want to get that sentence into a hash of 'keywords', but I want to allow multi-word keywords; I ...
4
Solved
My problem is conceptually similar to solving anagrams, except I can't just use a dictionary lookup. I am trying to find plausible words rather than real words.
I have created an N-gram model (for...
Marrakech asked 16/4, 2010 at 6:12
2
Solved
I want to implement some applications with n-grams (preferably in PHP).
Which type of n-grams is more adequate for most purposes? A word level or a character level n-gram? How could you impleme...
© 2022 - 2024 — McMap. All rights reserved.