n-gram - McMap

2

Solved

Split a string in consecutive substrings of size n in R in an efficient way

# Input n <- 2 "abcd" # Output c("ab", "bc", "cd") I don't want to use a for loop or sapply

r string n-gram

Lenorelenox asked 21/9, 2023 at 8:27

4

Solved

Find best substring match

I'm looking for a library or a method using existing libraries( difflib, fuzzywuzzy, python-levenshtein) to find the closest match of a string (query) in a text (corpus) I've developped a method b...

python match distance n-gram

Endosteum asked 15/3, 2016 at 13:54

3

Overcoming MemoryError / Slow Runtime in Ashton String task

In the Ashton String task, the goal is to: Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated stri...

python string out-of-memory n-gram

Jene asked 30/12, 2015 at 14:8

3

Sequence prediction of characters?

I am new to machine learning, so please go easy in case the problem is trivial. I have been given a sequence of observed characters say, ABABBABBB..... (n characters). My goal is to predict the ne...

text machine-learning neural-network lstm n-gram

Pontus asked 12/3, 2017 at 13:30

3

Solved

Extract keyphrases from text (1-4 word ngrams)

What's the best way to extract keyphrases from a block of text? I'm writing a tool to do keyword extraction: something like this. I've found a few libraries for Python and Perl to extract n-grams, ...

javascript keyword n-gram

Cult asked 16/8, 2011 at 21:47

1

Train NGramModel in Python

I am using Python 3.5, installed and managed with Anaconda. I want to train an NGramModel (from nltk) using some text. My installation does not find the module nltk.model There are some possible a...

python python-3.x nltk n-gram

Icken asked 28/5, 2016 at 22:41

0

Mysql ngram fulltext index doesn't work with utf8mb4_bin

I'm using utf8mb4_bin for title column so i expected it's fulltext case-sensitive search. But actually the query return empty. CREATE TABLE `test_table` ( `id` int NOT NULL AUTO_INCREMENT, `title` ...

mysql full-text-search uppercase case-sensitive n-gram

Reconstructionism asked 26/5, 2022 at 8:3

17

Solved

n-grams in python, four, five, six grams?

I'm looking for a way to split a text into n-grams. Normally I would do something like: import nltk from nltk import bigrams string = "I really like python, it's pretty awesome." string_bigrams = ...

python string nltk n-gram

Expropriate asked 8/7, 2013 at 16:35

3

Solved

Generate bigrams with NLTK

I am trying to produce a bigram list of a given sentence for example, if I type, To be or not to be I want the program to generate to be, be or, or not, not to, to be I tried the following...

python nltk n-gram

Labroid asked 6/6, 2016 at 6:44

6

Solved

Counting bigrams (pair of two words) in a file using Python

I want to count the number of occurrences of all bigrams (pair of adjacent words) in a file using python. Here, I am dealing with very large files, so I am looking for an efficient way. I tried usi...

python regex n-gram

Expensive asked 19/9, 2012 at 4:44

4

Solved

Quick implementation of character n-grams for word

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and ...

python-2.7 n-gram

Broadbent asked 6/9, 2013 at 12:39

4

Solved

Fast n-gram calculation

I'm using NLTK to search for n-grams in a corpus but it's taking a very long time in some cases. I've noticed calculating n-grams isn't an uncommon feature in other packages (apparently Haystack ha...

python nlp nltk n-gram

Dorweiler asked 29/9, 2011 at 0:49

2

Solved

Trying to set the max_gram and min_gram in Elasticsearch

Im trying to deploy a Ruby on Rails app on a Ubuntu 16.04 EC2 server but is giving a error about the difference between max_gram and min_gram on Elasticsearch, i don't have any experience with Elas...

elasticsearch n-gram elasticsearch-analyzers

Russo asked 7/8, 2019 at 13:44

8

The n-gram that is the most frequent one among all the words

I came across the following programming interview problem: Challenge 1: N-grams An N-gram is a sequence of N consecutive characters from a given word. For the word "pilot" there are three 3-grams...

c algorithm n-gram

Histogen asked 4/9, 2014 at 0:27

4

Solved

Is there a bi gram or tri gram feature in Spacy?

The below code breaks the sentence into individual tokens and the output is as below "cloud" "computing" "is" "benefiting" " major" "manufacturing" "companies" import en_core_web_sm nlp = en_c...

python-3.x nlp tokenize spacy n-gram

Anderson asked 3/12, 2018 at 16:50

3

Solved

Elasticsearch: Find substring match

I want to perform both exact word match and partial word/substring match. For example if I search for "men's shaver" then I should be able to find "men's shaver" in the result. But in case case I s...

autocomplete elasticsearch substring stringtokenizer n-gram

Consecration asked 23/4, 2014 at 12:11

2

Solved

TF-IDF vectorizer to extract ngrams

How can I use TF-IDF vectorizer from the scikit-learn library to extract unigrams and bigrams of tweets? I want to train a classifier with the output. This is the code from scikit-learn: from sklea...

python scikit-learn n-gram tfidfvectorizer

Meridethmeridian asked 28/10, 2020 at 8:10

2

ElasticSearch use "best match" of ngram terms instead of "synonym"?

Is it possible to tell ElasticSearch to use "best match" of all grams instead of using grams as synonyms? By default ElasticSearch uses grams as synonyms and returns poorly matching documents. It'...

elasticsearch n-gram trigram

Glede asked 9/12, 2017 at 13:17

2

Bag of Words (BOW) vs N-gram (sklearn CountVectorizer) - text documents classification

As far as I know, in Bag Of Words method, features are a set of words and their frequency counts in a document. In another hand, N-grams, for example unigrams does exactly the same, but it does not...

python scikit-learn feature-extraction feature-selection n-gram

Lesko asked 31/7, 2018 at 20:10

3

Solved

most common 2-grams using python

Given a string: this is a test this is How can I find the top-n most common 2-grams? In the string above, all 2-grams are: {this is, is a, test this, this is} As you can notice, the 2-gram th...

python python-2.7 pyspark n-gram python-collections

Mumble asked 18/4, 2017 at 13:33

3

Solved

Fast/Optimize N-gram implementations in python

Which ngram implementation is fastest in python? I've tried to profile nltk's vs scott's zip (http://locallyoptimal.com/blog/2013/01/20/elegant-n-gram-generation-in-python/): from nltk.util impor...

python nlp nltk information-retrieval n-gram

Tour asked 19/2, 2014 at 14:16

3

Solved

How to find the most common bi-grams with BigQuery?

I want to find the most common bi-grams (pair of words) in my table. How can I do this with BigQuery?

google-bigquery n-gram

Bulger asked 11/6, 2014 at 21:37

4

Solved

counting n-gram frequency in python nltk

I have the following code. I know that I can use apply_freq_filter function to filter out collocations that are less than a frequency count. However, I don't know how to get the frequencies of all ...

python nltk n-gram

Tenterhook asked 16/1, 2013 at 18:0

2

Creating N-Grams with tm & RWeka - works with VCorpus but not Corpus

Following the many guides to creating biGrams using the 'tm' and 'RWeka' packages, I was getting frustrated that only 1-Grams were being returned in the tdm. Through much trial and error I discover...

r tm n-gram term-document-matrix rweka

Unicuspid asked 13/3, 2017 at 5:33

1

ElasticSearch Edge NGram vs Prefix query

Let's say we have a text field that is relatively short, let's say maximum 10 characters and is saved as a keyword. I want my users to be able to prefix-search this field (not autocomplete / search...

elasticsearch prefix n-gram

Erleneerlewine asked 23/10, 2017 at 14:10

n-gram Questions

Recommended topics

Hot tags