n-gram - 2 - McMap

3

Python - compare n-grams across multiple text files

First time poster - I am a new Python user with limited programming skills. Ultimately I am trying to identify and compare n-grams across numerous text documents found in the same directory. My ana...

python n-gram

Glean asked 10/12, 2014 at 23:36

4

Solved

Python NLTK: Bigrams trigrams fourgrams

I have this example and i want to know how to get this result. I have text and I tokenize it then I collect the bigram and trigram and fourgram like that import nltk from nltk import word_tokeniz...

python nltk n-gram

Footy asked 22/6, 2014 at 0:16

3

Solved

Get bigrams and trigrams in word2vec Gensim

I am currently using uni-grams in my word2vec model as follows. def review_to_sentences( review, tokenizer, remove_stopwords=False ): #Returns a list of sentences, where each sentence is a list o...

python tokenize word2vec gensim n-gram

Putput asked 9/9, 2017 at 9:49

1

Solved

How to get the probability of bigrams in a text of sentences?

I have a text which has many sentences. How can I use nltk.ngrams to process it? This is my code: sequence = nltk.tokenize.word_tokenize(raw) bigram = ngrams(sequence,2) freq_dist = nltk.Freq...

python python-3.x nltk n-gram

Glaikit asked 2/3, 2019 at 20:7

7

Solved

N-gram generation from a sentence

How to generate an n-gram of a string like: String Input="This is my car." I want to generate n-gram with this input: Input Ngram size = 3 Output should be: This is my car This is is my my ...

java lucene nlp n-gram

Dissuade asked 7/9, 2010 at 7:53

1

Solved

String Matching Using TF-IDF, NGrams and Cosine Similarity in Python

I am working on my first major data science project. I am attempting to match names between a large list of data from one source, to a cleansed dictionary in another. I am using this string matchin...

python tf-idf n-gram cosine-similarity

Kristiekristien asked 18/12, 2018 at 6:14

7

What algorithm I need to find n-grams?

What algorithm is used for finding ngrams? Supposing my input data is an array of words and the size of the ngrams I want to find, what algorithm I should use? I'm asking for code, with preferenc...

r n-gram

Palmira asked 17/11, 2011 at 1:53

2

Solved

Python interface to ARPA files

I'm looking for a pythonic interface to load ARPA files (back-off language models) and use them to evaluate some text, e.g. get its log-probability, perplexity etc. I don't need to generate the AR...

python nlp n-gram language-model

Barbee asked 26/5, 2014 at 4:5

2

Using Keras Tokenizer to generate n-grams

Is it possible to use n-grams in Keras? E.g., sentences contain in X_train dataframe with "sentences" column. I use tokenizer from Keras in the following manner: tokenizer = Tokenizer(lower=True...

nlp keras tokenize text-processing n-gram

Intermission asked 12/9, 2017 at 10:2

2

Solved

Really fast word ngram vectorization in R

edit: The new package text2vec is excellent, and solves this problem (and many others) really well. text2vec on CRAN text2vec on github vignette that illustrates ngram tokenization I have a prett...

r vectorization text-mining n-gram text2vec

Flemming asked 22/7, 2015 at 17:50

3

How to generate bi/tri-grams using spacy/nltk

The input text are always list of dish names where there are 1~3 adjectives and a noun Inputs thai iced tea spicy fried chicken sweet chili pork thai chicken curry outputs: thai tea, iced tea ...

python nlp nltk n-gram spacy

Mizuki asked 31/8, 2016 at 5:53

8

Solved

Computing N Grams using Python

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like: "Cystic fibrosis affects 30,000 children and young adults in the US alone Inhaling the mists of salt w...

python nlp nltk n-gram

Obedience asked 16/11, 2012 at 20:26

2

Solved

NLTK package to estimate the (unigram) perplexity

I am trying to calculate the perplexity for the data I have. The code I am using is: import sys sys.path.append("/usr/local/anaconda/lib/python2.7/site-packages/nltk") from nltk.corpus import ...

python-2.7 nlp nltk n-gram language-model

Quartile asked 21/10, 2015 at 18:48

1

Solved

Compute word n-grams on original text or after lemma/stemming process?

I'm thinking about use word n-grams techniques on a raw text. But I have a doubt: does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams o...

information-retrieval n-gram text-analysis stemming lemmatization

Olivares asked 10/11, 2017 at 9:22

4

Solved

Java Lucene NGramTokenizer

I am trying tokenize strings into ngrams. Strangely in the documentation for the NGramTokenizer I do not see a method that will return the individual ngrams that were tokenized. In fact I only see ...

java lucene tokenize n-gram

Deppy asked 17/11, 2012 at 18:50

3

Predicting next word with text2vec in R

I am building a language model in R to predict a next word in the sentence based on the previous words. Currently my model is a simple ngram model with Kneser-Ney smoothing. It predicts next word b...

r nlp n-gram text2vec

Walker asked 21/4, 2016 at 21:6

5

Solved

Creating a dictionary for each word in a file and counting the frequency of words that follow it

I am trying to solve a difficult problem and am getting lost. Here's what I'm supposed to do: INPUT: file OUTPUT: dictionary Return a dictionary whose keys are all the words in the file (broken ...

python dictionary nltk counter n-gram

Healthful asked 23/6, 2017 at 20:22

5

Solved

When are n-grams (n>3) important as opposed to just bigrams or trigrams?

I am just wondering what is the use of n-grams (n>3) (and their occurrence frequency) considering the computational overhead in computing them. Are there any applications where bigrams or trigrams ...

nlp data-mining nltk n-gram

Dawndawna asked 23/4, 2012 at 18:20

1

Predicting phrases instead of just next word

For an application that we built, we are using a simple statistical model for word prediction (like Google Autocomplete) to guide search. It uses a sequence of ngrams gathered from a large corpus...

algorithm autocomplete n-gram phrases

Marvelous asked 22/3, 2017 at 20:46

1

Solved

Document-term matrix in R - bigram tokenizer not working

I am trying to make 2 document-term matrices for a corpus, one with unigrams and one with bigrams. However, the bigram matrix is currently just identical to the unigram matrix, and I'm not sure why...

r tokenize tm n-gram rweka

Trichromatic asked 5/3, 2017 at 4:11

1

Is there a more efficient way to find most common n-grams?

I'm trying to find k most common n-grams from a large corpus. I've seen lots of places suggesting the naïve approach - simply scanning through the entire corpus and keeping a dictionary of the coun...

algorithm nlp n-gram

Hanhana asked 21/2, 2017 at 17:12

3

Solved

Generate ngrams with Julia

To generate word bigrams in Julia, I could simply zip through the original list and a list that drops the first element, e.g.: julia> s = split("the lazy fox jumps over the brown dog") 8-elemen...

nlp zip julia n-gram

Hydrotherapy asked 21/2, 2017 at 7:20

2

Counting bigrams real fast (with or without multiprocessing) - python

Given the big.txt from norvig.com/big.txt, the goal is to count the bigrams really fast (Imagine that I have to repeat this counting 100,000 times). According to Fast/Optimize N-gram implementatio...

python optimization mapreduce counter n-gram

Qktp asked 2/11, 2016 at 6:3

5

Solved

How to compute skipgrams in python?

A k skipgram is an ngram which is a superset of all ngrams and each (k-i )skipgram till (k-i)==0 (which includes 0 skip grams). So how to efficiently compute these skipgrams in python? Following i...

python nlp n-gram language-model

Skillless asked 6/8, 2015 at 5:44

2

Solved

Finding conditional probability of trigram in python nltk

I have started learning NLTK and I am following a tutorial from here, where they find conditional probability using bigrams like this. import nltk from nltk.corpus import brown cfreq_brown_2gram =...

python nlp nltk n-gram

Slaughter asked 28/6, 2016 at 6:25

n-gram Questions

Recommended topics

Hot tags