n-gram Questions

3

First time poster - I am a new Python user with limited programming skills. Ultimately I am trying to identify and compare n-grams across numerous text documents found in the same directory. My ana...
Glean asked 10/12, 2014 at 23:36

4

Solved

I have this example and i want to know how to get this result. I have text and I tokenize it then I collect the bigram and trigram and fourgram like that import nltk from nltk import word_tokeniz...
Footy asked 22/6, 2014 at 0:16

3

Solved

I am currently using uni-grams in my word2vec model as follows. def review_to_sentences( review, tokenizer, remove_stopwords=False ): #Returns a list of sentences, where each sentence is a list o...
Putput asked 9/9, 2017 at 9:49

1

Solved

I have a text which has many sentences. How can I use nltk.ngrams to process it? This is my code: sequence = nltk.tokenize.word_tokenize(raw) bigram = ngrams(sequence,2) freq_dist = nltk.Freq...
Glaikit asked 2/3, 2019 at 20:7

7

Solved

How to generate an n-gram of a string like: String Input="This is my car." I want to generate n-gram with this input: Input Ngram size = 3 Output should be: This is my car This is is my my ...
Dissuade asked 7/9, 2010 at 7:53

1

Solved

I am working on my first major data science project. I am attempting to match names between a large list of data from one source, to a cleansed dictionary in another. I am using this string matchin...
Kristiekristien asked 18/12, 2018 at 6:14

7

What algorithm is used for finding ngrams? Supposing my input data is an array of words and the size of the ngrams I want to find, what algorithm I should use? I'm asking for code, with preferenc...
Palmira asked 17/11, 2011 at 1:53

2

Solved

I'm looking for a pythonic interface to load ARPA files (back-off language models) and use them to evaluate some text, e.g. get its log-probability, perplexity etc. I don't need to generate the AR...
Barbee asked 26/5, 2014 at 4:5

2

Is it possible to use n-grams in Keras? E.g., sentences contain in X_train dataframe with "sentences" column. I use tokenizer from Keras in the following manner: tokenizer = Tokenizer(lower=True...
Intermission asked 12/9, 2017 at 10:2

2

Solved

edit: The new package text2vec is excellent, and solves this problem (and many others) really well. text2vec on CRAN text2vec on github vignette that illustrates ngram tokenization I have a prett...
Flemming asked 22/7, 2015 at 17:50

3

The input text are always list of dish names where there are 1~3 adjectives and a noun Inputs thai iced tea spicy fried chicken sweet chili pork thai chicken curry outputs: thai tea, iced tea ...
Mizuki asked 31/8, 2016 at 5:53

8

Solved

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like: "Cystic fibrosis affects 30,000 children and young adults in the US alone Inhaling the mists of salt w...
Obedience asked 16/11, 2012 at 20:26

2

Solved

I am trying to calculate the perplexity for the data I have. The code I am using is: import sys sys.path.append("/usr/local/anaconda/lib/python2.7/site-packages/nltk") from nltk.corpus import ...
Quartile asked 21/10, 2015 at 18:48

1

Solved

I'm thinking about use word n-grams techniques on a raw text. But I have a doubt: does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams o...

4

Solved

I am trying tokenize strings into ngrams. Strangely in the documentation for the NGramTokenizer I do not see a method that will return the individual ngrams that were tokenized. In fact I only see ...
Deppy asked 17/11, 2012 at 18:50

3

I am building a language model in R to predict a next word in the sentence based on the previous words. Currently my model is a simple ngram model with Kneser-Ney smoothing. It predicts next word b...
Walker asked 21/4, 2016 at 21:6

5

Solved

I am trying to solve a difficult problem and am getting lost. Here's what I'm supposed to do: INPUT: file OUTPUT: dictionary Return a dictionary whose keys are all the words in the file (broken ...
Healthful asked 23/6, 2017 at 20:22

5

Solved

I am just wondering what is the use of n-grams (n>3) (and their occurrence frequency) considering the computational overhead in computing them. Are there any applications where bigrams or trigrams ...
Dawndawna asked 23/4, 2012 at 18:20

1

For an application that we built, we are using a simple statistical model for word prediction (like Google Autocomplete) to guide search. It uses a sequence of ngrams gathered from a large corpus...
Marvelous asked 22/3, 2017 at 20:46

1

Solved

I am trying to make 2 document-term matrices for a corpus, one with unigrams and one with bigrams. However, the bigram matrix is currently just identical to the unigram matrix, and I'm not sure why...
Trichromatic asked 5/3, 2017 at 4:11

1

I'm trying to find k most common n-grams from a large corpus. I've seen lots of places suggesting the naïve approach - simply scanning through the entire corpus and keeping a dictionary of the coun...
Hanhana asked 21/2, 2017 at 17:12

3

Solved

To generate word bigrams in Julia, I could simply zip through the original list and a list that drops the first element, e.g.: julia> s = split("the lazy fox jumps over the brown dog") 8-elemen...
Hydrotherapy asked 21/2, 2017 at 7:20

2

Given the big.txt from norvig.com/big.txt, the goal is to count the bigrams really fast (Imagine that I have to repeat this counting 100,000 times). According to Fast/Optimize N-gram implementatio...
Qktp asked 2/11, 2016 at 6:3

5

Solved

A k skipgram is an ngram which is a superset of all ngrams and each (k-i )skipgram till (k-i)==0 (which includes 0 skip grams). So how to efficiently compute these skipgrams in python? Following i...
Skillless asked 6/8, 2015 at 5:44

2

Solved

I have started learning NLTK and I am following a tutorial from here, where they find conditional probability using bigrams like this. import nltk from nltk.corpus import brown cfreq_brown_2gram =...
Slaughter asked 28/6, 2016 at 6:25

© 2022 - 2024 — McMap. All rights reserved.