tokenize - 6

4

Solved

Elasticsearch wildcard search on not_analyzed field

I have an index like following settings and mapping; { "settings":{ "index":{ "analysis":{ "analyzer":{ "analyzer_keyword":{ "tokenizer":"keyword", "filter":"lowercase" } } } } }, "ma...

search lucene elasticsearch tokenize

Vested asked 16/1, 2014 at 11:53

5

Solved

Can a line of Python code know its indentation nesting level?

From something like this: print(get_indentation_level()) print(get_indentation_level()) print(get_indentation_level()) I would like to get something like this: 1 2 3 Can the code read its...

python reflection metaprogramming indentation tokenize

Calbert asked 26/8, 2016 at 18:7

4

Solved

Javascript, split a string in 4 pieces, and leave the rest as one big piece

I'm building a Javascript chat bot for something, and I ran into an issue: I use string.split() to tokenize my input like this: tokens = message.split(" "); Now my problem is that I need 4 tokens...

javascript arrays tokenize

Lipfert asked 27/8, 2016 at 19:16

2

Solved

which tokenizer is better to be used with nltk

I have started learning nltk and following this tutorial. First we use the built-in tokenizer by using sent_tokenize and later we use PunktSentenceTokenizer. The tutorial mentions that PunktSentenc...

python nltk tokenize

Titanium asked 22/6, 2016 at 4:49

3

Solved

Search for name(text) with spaces in elasticsearch

Searching for names(text) with spaces in it, causing problem to me, I have mapping similar to "{"user":{"properties":{"name":{"type":"string"}}}}" Ideally what it should return and rank result...

search elasticsearch tokenize analyzer

Mcclimans asked 23/5, 2013 at 8:17

10

Solved

Scanner vs. StringTokenizer vs. String.Split

I just learned about Java's Scanner class and now I'm wondering how it compares/competes with the StringTokenizer and String.Split. I know that the StringTokenizer and String.Split only work on Str...

java java.util.scanner tokenize split

Noctiluca asked 27/3, 2009 at 19:29

5

Solved

How to insert a character every N characters in a string in C++

How can I insert a character into a string exactly after 1 character? I need to insert '|' into the string after every other character. In other words (C++): "Tokens all around!" Turns into: "T|o...

c++string tokenize

Trapezium asked 14/1, 2015 at 20:42

4

Solved

PHP: split a string of alternating groups of characters into an array

I have a string whose correct syntax is the regex ^([0-9]+[abc])+$. So examples of valid strings would be: '1a2b' or '00333b1119a555a0c' For clarity, the string is a list of (value, letter) pairs ...

php regex tokenize regex-greedy

Compensatory asked 25/3, 2016 at 8:50

8

Tokenizing strings in C

I have been trying to tokenize a string using SPACE as delimiter but it doesn't work. Does any one have suggestion on why it doesn't work? Edit: tokenizing using: strtok(string, " "); The code ...

c string tokenize

Sectarian asked 5/11, 2008 at 19:46

1

Solved

Python re.split() vs nltk word_tokenize and sent_tokenize

I was going through this question. Am just wondering whether NLTK would be faster than regex in word/sentence tokenization.

python regex nlp nltk tokenize

Lazulite asked 11/2, 2016 at 17:11

1

Solved

How to avoid NLTK's sentence tokenizer splitting on abbreviations?

I'm currently using NLTK for language processing, but I have encountered a problem of sentence tokenizing. Here's the problem: Assume I have a sentence: "Fig. 2 shows a U.S.A. map." When I use pun...

python nlp nltk tokenize

Flaw asked 15/1, 2016 at 7:1

3

Solved

How to prevent Facet Terms from tokenizing

I am using Facet Terms to get all the unique values and their count for a field. And I am getting wrong results. term: web Count: 1191979 term: misc Count: 1191979 term: passwd Count: 119197...

tokenize elasticsearch

Ellaelladine asked 10/4, 2012 at 17:36

1

Solved

How do you extract only the date from a python datetime? [duplicate]

I have a dataframe in python. One of its columns is labelled time, which is a timestamp. Using the following code, I have converted the timestamp to datetime: milestone['datetime'] = milesto...

python datetime pandas tokenize

Mccormack asked 20/11, 2015 at 19:35

8

Split a string with multiple delimiters using only String methods

I want to split a string into tokens. I ripped of another Stack Overflow question - Equivalent to StringTokenizer with multiple characters delimiters, but I want to know if this can be done with ...

java tokenize

Busywork asked 31/10, 2015 at 17:7

3

Solved

Is SQLite on Android built with the ICU tokenizer enabled for FTS?

Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...). If we can, does anyone know what locales are suported, and whether it varies by platform version?

android sqlite locale tokenize full-text-search

Colson asked 15/8, 2011 at 20:9

4

Solved

Generating PHP code (from Parser Tokens)

Is there any available solution for (re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably with the assoc...

php code-generation tokenize

Steward asked 21/2, 2011 at 16:11

1

Google Sites API full-text search does not work for non-Western languages

In my JavaEE application, I'm using the Atom-based Google Sites API to retrieve content from a non-public Google Site. In essence, we're using the Google Site as a lightweight CMS, and from within ...

full-text-search tokenize google-sites google-data-api

Interdenominational asked 11/11, 2014 at 5:0

2

Solved

get the last token of a string in C

what I want to do is given an input string, which I will not know it's size or the number of tokens, be able to print it's last token. e.x.: char* s = "some/very/big/string"; char* token; cons...

c tokenize strtok

Kan asked 28/9, 2015 at 12:27

1

Solved

What characters does the standard tokenizer delimit on?

I was wondering which characters are used to delimit a string for elastic search's standard tokenizer?

elasticsearch tokenize delimiter

Kovno asked 23/9, 2015 at 14:23

2

Solved

Tokenizing large (>70MB) TXT file using Python NLTK. Concatenation & write data to stream errors

First of all, I am new to python/nltk so my apologies if the question is too basic. I have a large file that I am trying to tokenize; I get memory errors. One solution I've read about is to read ...

python nltk tokenize

Poirier asked 24/3, 2012 at 16:12

1

Solved

How to use sklearn's CountVectorizerand() to get ngrams that include any punctuation as separate tokens?

I use sklearn.feature_extraction.text.CountVectorizer to compute n-grams. Example: import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html ngram_size = 4 string = ...

python nlp scikit-learn tokenize n-gram

Territorialize asked 20/8, 2015 at 21:35

1

Sentence tokenization for texts that contains quotes

Code: from nltk.tokenize import sent_tokenize pprint(sent_tokenize(unidecode(text))) Output: [After Du died of suffocation, her boyfriend posted a heartbreaking message online: "Losing conscio...

python nlp nltk tokenize

Babysitter asked 14/8, 2015 at 6:3

1

Solved

PYTHON: How to pass tokenizer with keyword arguments to scikit's CountVectorizer?

I have a custom tokenizer function with some keyword arguments: def tokenizer(text, stem=True, lemmatize=False, char_lower_limit=2, char_upper_limit=30): do things... return tokens Now, how ca...

python scikit-learn tokenize feature-extraction keyword-argument

Delp asked 5/8, 2015 at 22:39

8

Solved

Splitting comma separated string in a PL/SQL stored proc

I've CSV string 100.01,200.02,300.03 which I need to pass to a PL/SQL stored procedure in Oracle. Inside the proc,I need to insert these values in a Number column in the table. For this, I got a w...

oracle plsql tokenize

Messing asked 23/10, 2010 at 14:9

3

Solved

Retrieve analyzed tokens from ElasticSearch documents

Trying to access the analyzed/tokenized text in my ElasticSearch documents. I know you can use the Analyze API to analyze arbitrary text according your analysis modules. So I could copy and paste ...

text elasticsearch tokenize

Lowbrow asked 15/11, 2012 at 19:28

tokenize Questions

Recommended topics

Hot tags