n-gram Questions

2

Solved

# Input n <- 2 "abcd" # Output c("ab", "bc", "cd") I don't want to use a for loop or sapply
Lenorelenox asked 21/9, 2023 at 8:27

4

Solved

I'm looking for a library or a method using existing libraries( difflib, fuzzywuzzy, python-levenshtein) to find the closest match of a string (query) in a text (corpus) I've developped a method b...
Endosteum asked 15/3, 2016 at 13:54

3

In the Ashton String task, the goal is to: Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated stri...
Jene asked 30/12, 2015 at 14:8

3

I am new to machine learning, so please go easy in case the problem is trivial. I have been given a sequence of observed characters say, ABABBABBB..... (n characters). My goal is to predict the ne...
Pontus asked 12/3, 2017 at 13:30

3

Solved

What's the best way to extract keyphrases from a block of text? I'm writing a tool to do keyword extraction: something like this. I've found a few libraries for Python and Perl to extract n-grams, ...
Cult asked 16/8, 2011 at 21:47

1

I am using Python 3.5, installed and managed with Anaconda. I want to train an NGramModel (from nltk) using some text. My installation does not find the module nltk.model There are some possible a...
Icken asked 28/5, 2016 at 22:41

0

I'm using utf8mb4_bin for title column so i expected it's fulltext case-sensitive search. But actually the query return empty. CREATE TABLE `test_table` ( `id` int NOT NULL AUTO_INCREMENT, `title` ...
Reconstructionism asked 26/5, 2022 at 8:3

17

Solved

I'm looking for a way to split a text into n-grams. Normally I would do something like: import nltk from nltk import bigrams string = "I really like python, it's pretty awesome." string_bigrams = ...
Expropriate asked 8/7, 2013 at 16:35

3

Solved

I am trying to produce a bigram list of a given sentence for example, if I type, To be or not to be I want the program to generate to be, be or, or not, not to, to be I tried the following...
Labroid asked 6/6, 2016 at 6:44

6

Solved

I want to count the number of occurrences of all bigrams (pair of adjacent words) in a file using python. Here, I am dealing with very large files, so I am looking for an efficient way. I tried usi...
Expensive asked 19/9, 2012 at 4:44

4

Solved

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and ...
Broadbent asked 6/9, 2013 at 12:39

4

Solved

I'm using NLTK to search for n-grams in a corpus but it's taking a very long time in some cases. I've noticed calculating n-grams isn't an uncommon feature in other packages (apparently Haystack ha...
Dorweiler asked 29/9, 2011 at 0:49

2

Solved

Im trying to deploy a Ruby on Rails app on a Ubuntu 16.04 EC2 server but is giving a error about the difference between max_gram and min_gram on Elasticsearch, i don't have any experience with Elas...
Russo asked 7/8, 2019 at 13:44

8

I came across the following programming interview problem: Challenge 1: N-grams An N-gram is a sequence of N consecutive characters from a given word. For the word "pilot" there are three 3-grams...
Histogen asked 4/9, 2014 at 0:27

4

Solved

The below code breaks the sentence into individual tokens and the output is as below "cloud" "computing" "is" "benefiting" " major" "manufacturing" "companies" import en_core_web_sm nlp = en_c...
Anderson asked 3/12, 2018 at 16:50

3

Solved

I want to perform both exact word match and partial word/substring match. For example if I search for "men's shaver" then I should be able to find "men's shaver" in the result. But in case case I s...
Consecration asked 23/4, 2014 at 12:11

2

Solved

How can I use TF-IDF vectorizer from the scikit-learn library to extract unigrams and bigrams of tweets? I want to train a classifier with the output. This is the code from scikit-learn: from sklea...
Meridethmeridian asked 28/10, 2020 at 8:10

2

Is it possible to tell ElasticSearch to use "best match" of all grams instead of using grams as synonyms? By default ElasticSearch uses grams as synonyms and returns poorly matching documents. It'...
Glede asked 9/12, 2017 at 13:17

2

As far as I know, in Bag Of Words method, features are a set of words and their frequency counts in a document. In another hand, N-grams, for example unigrams does exactly the same, but it does not...

3

Solved

Given a string: this is a test this is How can I find the top-n most common 2-grams? In the string above, all 2-grams are: {this is, is a, test this, this is} As you can notice, the 2-gram th...
Mumble asked 18/4, 2017 at 13:33

3

Solved

Which ngram implementation is fastest in python? I've tried to profile nltk's vs scott's zip (http://locallyoptimal.com/blog/2013/01/20/elegant-n-gram-generation-in-python/): from nltk.util impor...
Tour asked 19/2, 2014 at 14:16

3

Solved

I want to find the most common bi-grams (pair of words) in my table. How can I do this with BigQuery?
Bulger asked 11/6, 2014 at 21:37

4

Solved

I have the following code. I know that I can use apply_freq_filter function to filter out collocations that are less than a frequency count. However, I don't know how to get the frequencies of all ...
Tenterhook asked 16/1, 2013 at 18:0

2

Following the many guides to creating biGrams using the 'tm' and 'RWeka' packages, I was getting frustrated that only 1-Grams were being returned in the tdm. Through much trial and error I discover...
Unicuspid asked 13/3, 2017 at 5:33

1

Let's say we have a text field that is relatively short, let's say maximum 10 characters and is saved as a keyword. I want my users to be able to prefix-search this field (not autocomplete / search...
Erleneerlewine asked 23/10, 2017 at 14:10

© 2022 - 2024 — McMap. All rights reserved.