n-gram Questions
3
First time poster - I am a new Python user with limited programming skills. Ultimately I am trying to identify and compare n-grams across numerous text documents found in the same directory. My ana...
4
Solved
I have this example and i want to know how to get this result. I have text and I tokenize it then I collect the bigram and trigram and fourgram like that
import nltk
from nltk import word_tokeniz...
3
Solved
I am currently using uni-grams in my word2vec model as follows.
def review_to_sentences( review, tokenizer, remove_stopwords=False ):
#Returns a list of sentences, where each sentence is a list o...
1
Solved
I have a text which has many sentences. How can I use nltk.ngrams to process it?
This is my code:
sequence = nltk.tokenize.word_tokenize(raw)
bigram = ngrams(sequence,2)
freq_dist = nltk.Freq...
Glaikit asked 2/3, 2019 at 20:7
7
Solved
How to generate an n-gram of a string like:
String Input="This is my car."
I want to generate n-gram with this input:
Input Ngram size = 3
Output should be:
This
is
my
car
This is
is my
my ...
1
Solved
I am working on my first major data science project. I am attempting to match names between a large list of data from one source, to a cleansed dictionary in another. I am using this string matchin...
Kristiekristien asked 18/12, 2018 at 6:14
7
What algorithm is used for finding ngrams?
Supposing my input data is an array of words and the size of the ngrams I want to find, what algorithm I should use?
I'm asking for code, with preferenc...
2
Solved
I'm looking for a pythonic interface to load ARPA files (back-off language models) and use them to evaluate some text, e.g. get its log-probability, perplexity etc.
I don't need to generate the AR...
Barbee asked 26/5, 2014 at 4:5
2
Is it possible to use n-grams in Keras?
E.g., sentences contain in X_train dataframe with "sentences" column.
I use tokenizer from Keras in the following manner:
tokenizer = Tokenizer(lower=True...
Intermission asked 12/9, 2017 at 10:2
2
Solved
edit: The new package text2vec is excellent, and solves this problem (and many others) really well.
text2vec on CRAN
text2vec on github
vignette that illustrates ngram tokenization
I have a prett...
Flemming asked 22/7, 2015 at 17:50
3
8
Solved
I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:
"Cystic fibrosis affects 30,000 children and young adults in the US alone
Inhaling the mists of salt w...
2
Solved
I am trying to calculate the perplexity for the data I have. The code I am using is:
import sys
sys.path.append("/usr/local/anaconda/lib/python2.7/site-packages/nltk")
from nltk.corpus import ...
Quartile asked 21/10, 2015 at 18:48
1
Solved
I'm thinking about use word n-grams techniques on a raw text. But I have a doubt:
does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams o...
Olivares asked 10/11, 2017 at 9:22
4
Solved
I am trying tokenize strings into ngrams. Strangely in the documentation for the NGramTokenizer I do not see a method that will return the individual ngrams that were tokenized. In fact I only see ...
3
I am building a language model in R to predict a next word in the sentence based on the previous words. Currently my model is a simple ngram model with Kneser-Ney smoothing. It predicts next word b...
5
Solved
I am trying to solve a difficult problem and am getting lost.
Here's what I'm supposed to do:
INPUT: file
OUTPUT: dictionary
Return a dictionary whose keys are all the words in the file (broken ...
Healthful asked 23/6, 2017 at 20:22
5
Solved
I am just wondering what is the use of n-grams (n>3) (and their occurrence frequency) considering the computational overhead in computing them. Are there any applications where bigrams or trigrams ...
Dawndawna asked 23/4, 2012 at 18:20
1
For an application that we built, we are using a simple statistical model for word prediction (like Google Autocomplete) to guide search.
It uses a sequence of ngrams gathered from a large corpus...
Marvelous asked 22/3, 2017 at 20:46
1
Solved
I am trying to make 2 document-term matrices for a corpus, one with unigrams and one with bigrams. However, the bigram matrix is currently just identical to the unigram matrix, and I'm not sure why...
1
I'm trying to find k most common n-grams from a large corpus. I've seen lots of places suggesting the naïve approach - simply scanning through the entire corpus and keeping a dictionary of the coun...
3
Solved
To generate word bigrams in Julia, I could simply zip through the original list and a list that drops the first element, e.g.:
julia> s = split("the lazy fox jumps over the brown dog")
8-elemen...
2
Given the big.txt from norvig.com/big.txt, the goal is to count the bigrams really fast (Imagine that I have to repeat this counting 100,000 times).
According to Fast/Optimize N-gram implementatio...
Qktp asked 2/11, 2016 at 6:3
5
Solved
A k skipgram is an ngram which is a superset of all ngrams and each (k-i )skipgram till (k-i)==0 (which includes 0 skip grams). So how to efficiently compute these skipgrams in python?
Following i...
Skillless asked 6/8, 2015 at 5:44
2
Solved
I have started learning NLTK and I am following a tutorial from here, where they find conditional probability using bigrams like this.
import nltk
from nltk.corpus import brown
cfreq_brown_2gram =...
© 2022 - 2024 — McMap. All rights reserved.