n-gram - 3 - McMap

5

Find the most frequently occuring words in a text in R

Can someone help me with how to find the most frequently used two and three words in a text using R? My text is... text <- c("There is a difference between the common use of the term phrase an...

r n-gram

Conversable asked 18/5, 2016 at 6:38

2

Solved

How to extract the verbs and all corresponding adverbs from a text?

Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done: Input text:""He is talking weirdly. A horse can run fast. A big tree is ther...

python nltk n-gram

Snuffbox asked 27/1, 2016 at 6:10

3

Solved

How to remove stopwords efficiently from a list of ngram tokens in R

Here's an appeal for a better way to do something that I can already do inefficiently: filter a series of n-gram tokens using "stop words" so that the occurrence of any stop word term in an n-gram ...

r performance n-gram stop-words text-analysis

Rakes asked 12/10, 2015 at 0:9

5

Solved

Why does the ngrams() function give distinct bigrams?

I am writing an R script and am using library(ngram). Suppose I have a string, "good qualiti dog food bought sever vital can dog food product found good qualiti product look like stew process mea...

r nlp n-gram

Webfoot asked 29/9, 2015 at 17:25

5

Solved

Detecting random keyboard hits considering QWERTY keyboard layout

The winner of a recent Wikipedia vandalism detection competition suggests that detection could be improved by "detecting random keyboard hits considering QWERTY keyboard layout". Example: woijf qo...

algorithm n-gram qwerty text-classification

Douglassdougy asked 27/9, 2010 at 8:41

1

Solved

How to use sklearn's CountVectorizerand() to get ngrams that include any punctuation as separate tokens?

I use sklearn.feature_extraction.text.CountVectorizer to compute n-grams. Example: import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html ngram_size = 4 string = ...

python nlp scikit-learn tokenize n-gram

Territorialize asked 20/8, 2015 at 21:35

1

Solved

Token pattern for n-gram in TfidfVectorizer in python

Does TfidfVectorizer identify n-grams using python regular expressions? This issue arises while reading the documentation for scikit-learn TfidfVectorizer, I see that the pattern to recognize n-gr...

python regex scikit-learn n-gram

Episiotomy asked 26/3, 2015 at 23:51

1

Solved

Difference between NGramFilterFactory and EdgeNGramFilterFactory

I am a beginner in Solr. In my project, NGramFilterFactory and EdgeNGramFilterFactory, both are being used for a field. My understanding as per the document is EdgeNGramFilterFactory is used for "s...

solr search-engine tokenize n-gram filterfactory

Obligatory asked 18/5, 2015 at 9:14

3

Solved

How to create a bigram from a text file with frequency count in Spark/Scala?

I want to take a text file and create a bigram of all words not separated by a dot ".", removing any special characters. I'm trying to do this using Spark and Scala. This text: Hello my Friend. H...

scala apache-spark n-gram

Cupping asked 18/4, 2015 at 3:28

0

Getting most likely documents of the query using phonetic filter in solr

I am using solr for spell checking/ query correction. I have added solr.PhoneticFilterFactory and solr.NGramFilterFactory in fieldType to perform spell checking. It is working fine but here the pro...

solr n-gram search-suggestion error-correction phonetics

Milklivered asked 15/12, 2014 at 12:39

6

Solved

Python: Reducing memory usage of dictionary

I'm trying to load a couple of files into the memory. The files have either of the following 3 formats: string TAB int string TAB float int TAB float. Indeed, they are ngram statics files, i...

python memory dictionary compression n-gram

Splice asked 22/4, 2012 at 3:3

2

How do people use n-grams for sentiment analysis, considering that as n increases, the memory requirement also increases rapidly?

I am trying to do Sentiment Analysis on Tweets using Python. To begin with, I've implemented an n-grams model. So, lets say our training data is I am a good kid He is a good kid, but he didn't g...

python nlp scikit-learn sentiment-analysis n-gram

Alithea asked 9/11, 2014 at 3:45

3

Effective 1-5 grams extraction with python

I have a huge files of 3,000,000 lines and each line have 20-40 words. I have to extract 1 to 5 ngrams from the corpus. My input files are tokenized plain text, e.g.: This is a foo bar sentence . ...

python nlp nltk information-retrieval n-gram

Staphylorrhaphy asked 13/10, 2014 at 13:45

1

Is there an alternate for the now removed module 'nltk.model.NGramModel'?

I've been trying to find out an alternative for two straight days now, and couldn't find anything relevant. I'm basically trying to get a probabilistic score of a synthesized sentence (synthesized...

python nltk n-gram

Whitening asked 18/10, 2014 at 18:24

4

Finding ngrams in R and comparing ngrams across corpora

I'm getting started with the tm package in R, so please bear with me and apologies for the big ol' wall of text. I have created a fairly large corpus of Socialist/Communist propaganda and would lik...

r text-mining n-gram tm

Bendix asked 27/10, 2013 at 6:8

1

Ngram model and perplexity in NLTK

To put my question in context, I would like to train and test/compare several (neural) language models. In order to focus on the models rather than data preparation I chose to use the Brown corpus ...

python nltk n-gram

Poock asked 12/5, 2013 at 16:40

2

Solved

find all two word phrases that appear in more than one row in a dataset

We would like to run a query that returns two word phrases that appear in more than one row. So for e.g. take the string "Data Ninja". Since it appears in more than one row in our dataset, the quer...

data-mining bigdata google-bigquery data-analysis n-gram

Subassembly asked 10/9, 2013 at 1:46

1

Solved

Understanding the `ngram_range` argument in a CountVectorizer in sklearn

I'm a little confused about how to use ngrams in the scikit-learn library in Python, specifically, how the ngram_range argument works in a CountVectorizer. Running this code: from sklearn.featu...

python scikit-learn n-gram feature-selection

Bullnecked asked 3/6, 2014 at 1:27

5

Solved

Simple implementation of N-Gram, tf-idf and Cosine similarity in Python

I need to compare documents stored in a DB and come up with a similarity score between 0 and 1. The method I need to use has to be very simple. Implementing a vanilla version of n-grams (where it...

python document n-gram tf-idf vsm

Laky asked 4/3, 2010 at 15:22

1

Solved

How to extract character ngram from sentences? - python

The following word2ngrams function extracts character 3grams from a word: >>> x = 'foobar' >>> n = 3 >>> [x[i:i+n] for i in range(len(x)-n+1)] ['foo', 'oob', 'oba', 'bar...

python regex string nlp n-gram

Status asked 15/3, 2014 at 18:32

3

Solved

Bytes vs Characters vs Words - which granularity for n-grams?

At least 3 types of n-grams can be considered for representing text documents: byte-level n-grams character-level n-grams word-level n-grams It's unclear to me which one should be used for a g...

nlp data-mining text-mining n-gram

Renovate asked 9/2, 2014 at 8:18

3

How to generate n-grams in scala?

I am trying to code dissociated press algorithm based on n-gram in scala. How to generate an n-gram for a large files: For example, for the file containing "the bee is the bee of the bees". Fir...

scala n-gram

Boice asked 24/11, 2011 at 14:55

2

Solved

N-grams vs other classifiers in text categorization

I'm new to text categorization techniques, I want to know the difference between the N-gram approach for text categorization and other classifier (decision tree, KNN, SVM) based text categorization...

machine-learning data-mining classification n-gram text-classification

Matthiew asked 1/12, 2013 at 18:54

2

Solved

How to get n-gram collocations and association in python nltk?

In this documentation, there is example using nltk.collocations.BigramAssocMeasures(), BigramCollocationFinder,nltk.collocations.TrigramAssocMeasures(), and TrigramCollocationFinder. There is exam...

python nlp nltk n-gram collocation

Shirtmaker asked 7/9, 2013 at 9:58

4

Can Drupal's search module search for a substring? (Partial Search)

Drupal's core search module, only searches for keywords, e.g. "sandwich". Can I make it search with a substring e.g. "sandw" and return my sandwich-results? Maybe there is a plugin that does that?...

search drupal partial n-gram

Mossback asked 16/4, 2010 at 15:17

n-gram Questions

Recommended topics

Hot tags