n-gram Questions
2
Solved
# Input
n <- 2
"abcd"
# Output
c("ab", "bc", "cd")
I don't want to use a for loop or sapply
4
Solved
I'm looking for a library or a method using existing libraries( difflib, fuzzywuzzy, python-levenshtein) to find the closest match of a string (query) in a text (corpus)
I've developped a method b...
3
In the Ashton String task, the goal is to:
Arrange all the distinct substrings of a given string in
lexicographical order and concatenate them. Print the Kth character of
the concatenated stri...
Jene asked 30/12, 2015 at 14:8
3
I am new to machine learning, so please go easy in case the problem is trivial.
I have been given a sequence of observed characters say, ABABBABBB..... (n characters). My goal is to predict the ne...
Pontus asked 12/3, 2017 at 13:30
3
Solved
What's the best way to extract keyphrases from a block of text? I'm writing a tool to do keyword extraction: something like this. I've found a few libraries for Python and Perl to extract n-grams, ...
Cult asked 16/8, 2011 at 21:47
1
I am using Python 3.5, installed and managed with Anaconda. I want to train an NGramModel (from nltk) using some text. My installation does not find the module nltk.model
There are some possible a...
Icken asked 28/5, 2016 at 22:41
0
I'm using utf8mb4_bin for title column so i expected it's fulltext case-sensitive search. But actually the query return empty.
CREATE TABLE `test_table` (
`id` int NOT NULL AUTO_INCREMENT,
`title` ...
Reconstructionism asked 26/5, 2022 at 8:3
17
Solved
I'm looking for a way to split a text into n-grams.
Normally I would do something like:
import nltk
from nltk import bigrams
string = "I really like python, it's pretty awesome."
string_bigrams = ...
3
Solved
I am trying to produce a bigram list of a given sentence for example, if I type,
To be or not to be
I want the program to generate
to be, be or, or not, not to, to be
I tried the following...
6
Solved
I want to count the number of occurrences of all bigrams (pair of adjacent words) in a file using python. Here, I am dealing with very large files, so I am looking for an efficient way. I tried usi...
4
Solved
I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and ...
Broadbent asked 6/9, 2013 at 12:39
4
Solved
I'm using NLTK to search for n-grams in a corpus but it's taking a very long time in some cases. I've noticed calculating n-grams isn't an uncommon feature in other packages (apparently Haystack ha...
2
Solved
Im trying to deploy a Ruby on Rails app on a Ubuntu 16.04 EC2 server but is giving a error about the difference between max_gram and min_gram on Elasticsearch, i don't have any experience with Elas...
Russo asked 7/8, 2019 at 13:44
8
I came across the following programming interview problem:
Challenge 1: N-grams
An N-gram is a sequence of N consecutive characters from a given word. For the word "pilot" there are three 3-grams...
4
Solved
The below code breaks the sentence into individual tokens and the output is as below
"cloud" "computing" "is" "benefiting" " major" "manufacturing" "companies"
import en_core_web_sm
nlp = en_c...
Anderson asked 3/12, 2018 at 16:50
3
Solved
I want to perform both exact word match and partial word/substring match. For example if I search for "men's shaver" then I should be able to find "men's shaver" in the result. But in case case I s...
Consecration asked 23/4, 2014 at 12:11
2
Solved
How can I use TF-IDF vectorizer from the scikit-learn library to extract unigrams and bigrams of tweets? I want to train a classifier with the output.
This is the code from scikit-learn:
from sklea...
Meridethmeridian asked 28/10, 2020 at 8:10
2
Is it possible to tell ElasticSearch to use "best match" of all grams instead of using grams as synonyms?
By default ElasticSearch uses grams as synonyms and returns poorly matching documents. It'...
Glede asked 9/12, 2017 at 13:17
2
As far as I know, in Bag Of Words method, features are a set of words and their frequency counts in a document. In another hand, N-grams, for example unigrams does exactly the same, but it does not...
Lesko asked 31/7, 2018 at 20:10
3
Solved
Given a string:
this is a test this is
How can I find the top-n most common 2-grams? In the string above, all 2-grams are:
{this is, is a, test this, this is}
As you can notice, the 2-gram th...
Mumble asked 18/4, 2017 at 13:33
3
Solved
Which ngram implementation is fastest in python?
I've tried to profile nltk's vs scott's zip (http://locallyoptimal.com/blog/2013/01/20/elegant-n-gram-generation-in-python/):
from nltk.util impor...
Tour asked 19/2, 2014 at 14:16
3
Solved
I want to find the most common bi-grams (pair of words) in my table. How can I do this with BigQuery?
Bulger asked 11/6, 2014 at 21:37
4
Solved
I have the following code. I know that I can use apply_freq_filter function to filter out collocations that are less than a frequency count. However, I don't know how to get the frequencies of all ...
2
Following the many guides to creating biGrams using the 'tm' and 'RWeka' packages, I was getting frustrated that only 1-Grams were being returned in the tdm. Through much trial and error I discover...
Unicuspid asked 13/3, 2017 at 5:33
1
Let's say we have a text field that is relatively short, let's say maximum 10 characters and is saved as a keyword.
I want my users to be able to prefix-search this field (not autocomplete / search...
Erleneerlewine asked 23/10, 2017 at 14:10
1 Next >
© 2022 - 2024 — McMap. All rights reserved.