n-gram Questions

1

Solved

A Markov chain is composed of a set of states which can transition to other states with a certain probability. A Markov chain can be easily represented in Neo4J by creating a node for each state, ...
Fidelia asked 17/5, 2013 at 4:7

1

Solved

I have a list of documents that have already been tokenized: dat <- list(c("texaco", "canada", "lowered", "contract", "price", "pay", "crude", "oil", "canadian", "cts", "barrel", "effective", ...
Coatee asked 10/5, 2013 at 19:44

2

Solved

I have a code that uses a cyclic polynomial rolling hash (Buzhash) to compute hash values of n-grams of source code. If i use small hash values (7-8 bits) then there are some collisions i.e. differ...
Impeller asked 3/5, 2013 at 18:38

2

Solved

When i use an analyzer with edgengram (min=3, max=7, front) + term_vector=with_positions_offsets With document having text = "CouchDB" When i search for "couc" My highlight is on "cou" and not "...
Newburg asked 3/7, 2012 at 2:19

2

Solved

I've seen tons of documentation all over the web about how the python NLTK makes it easy to compute bigrams of words. What about letters? What I want to do is plug in a dictionary and have it tel...
Armipotent asked 5/1, 2013 at 4:33

1

Solved

Im new to python and need help! i was practicing with python NLTK text classification. Here is the code example i am practicing on http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-...
Derbent asked 22/12, 2012 at 13:40

1

I'm working on auto completion search with Solr using EdgeNGrams. If the user is searching for names of employees, then auto completion should be applied. That is, I want the results to be like a G...
Clementclementas asked 1/8, 2012 at 4:38

2

Solved

I have been playing around with ElasticSearch for a new project of mine. I have set the default analyzers to use the ngram tokenfilter. This is my elasticsearch.yml file: index: analysis: analyze...
Slump asked 18/2, 2011 at 17:43

3

I'm working on deduping a database of people. For a first pass, I'm following a basic 2-step process to avoid an O(n^2) operation over the whole database, as described in the literature. First, I "...
Novak asked 5/4, 2012 at 19:34

3

Solved

I want to use ElasticSearch to search filenames (not the file's content). Therefore I need to find a part of the filename (exact match, no fuzzy search). Example: I have files with the following n...
Mccoy asked 23/2, 2012 at 21:10

2

Solved

I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can ...
Foucault asked 21/4, 2011 at 11:24

2

Solved

I'm trying to write an algorithm (which I'm assuming will rely on natural language processing techniques) to 'fill out' a list of search terms. There is probably a name for this kind of thing which...
Gaselier asked 29/9, 2011 at 23:30

1

Solved

I have been working on a project about sentence similarity. I know it has been asked many times in SO, but I just want to know if my problem can be accomplished by the method I use by the way that ...
Pouncey asked 27/10, 2010 at 19:59

5

Solved

Let's say I have a sentence of text: $body = 'the quick brown fox jumps over the lazy dog'; and I want to get that sentence into a hash of 'keywords', but I want to allow multi-word keywords; I ...
Thracophrygian asked 18/8, 2010 at 20:58

4

Solved

My problem is conceptually similar to solving anagrams, except I can't just use a dictionary lookup. I am trying to find plausible words rather than real words. I have created an N-gram model (for...

2

Solved

I want to implement some applications with n-grams (preferably in PHP). Which type of n-grams is more adequate for most purposes? A word level or a character level n-gram? How could you impleme...
Juanajuanita asked 23/6, 2009 at 12:37

© 2022 - 2024 — McMap. All rights reserved.