tokenize - 5

4

Solved

I am trying tokenize strings into ngrams. Strangely in the documentation for the NGramTokenizer I do not see a method that will return the individual ngrams that were tokenized. In fact I only see ...

java lucene tokenize n-gram

Deppy asked 17/11, 2012 at 18:50

2

Solved

Tokenize() in nltk.TweetTokenizer returning integers by splitting

Tokenize() in nltk.TweetTokenizer returning the 32-bit integers by dividing them into digits. It is only happening to some certain numbers, and I don't see any reason why? >>> from nltk.t...

python nltk tokenize

Morbidity asked 31/7, 2017 at 21:59

3

Splitting chinese document into sentences [closed]

I have to split Chinese text into multiple sentences. I tried the Stanford DocumentPreProcessor. It worked quite well for English but not for Chinese. Please can you let me know any good sen...

nlp tokenize stanford-nlp sentence

Parulis asked 12/12, 2014 at 10:4

2

Regex to extract value between a single quote and parenthesis using boost token iterator

I have a value like this: Supoose I have a string: s = "server ('m1.labs.teradata.com') username ('u\'se)r_*5') password('uer 5') dbname ('default')"; I need to extract token1 : 'm1.labs.ter...

c++regex tokenize

Defalcate asked 19/7, 2017 at 13:37

0

extracting whitespaces using regex in cpp

I have the following string : s = "server ('m1.labs.terada')ta.com') username ('user5') password('use r5') dbname ('default')"; I have defined a regex for extracting the values between the paran...

c++regex tokenize

Scum asked 19/7, 2017 at 4:16

1

Solved

Reloading Keras Tokenizer during Testing

I followed the tutorial here: (https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html) However, I modified the code to be able to save the generated model through h5py. Thus...

tensorflow keras tokenize text-classification word-embedding

Dunkle asked 26/6, 2017 at 13:31

1

Solved

NLTK tokenizer and Stanford corenlp tokenizer cannot distinct 2 sentences without space at period (.)

I have 2 sentences in my dataset: w1 = I am Pusheen the cat.I am so cute. # no space after period w2 = I am Pusheen the cat. I am so cute. # with space after period When I use NKTL tokenizer (bo...

python nlp nltk stanford-nlp tokenize

Ozzy asked 1/7, 2017 at 8:4

6

Solved

how to get data between quotes in java?

I have this lines of text the number of quotes could change like: Here just one "comillas" But I also could have more "mas" values in "comillas" and that "is" the "trick" I was thinking in a meth...

java quotes tokenize

Marco asked 24/9, 2009 at 17:43

1

Solved

How to apply NLTK word_tokenize library on a Pandas dataframe for Twitter data?

This is the Code that I am using for semantic analysis of twitter:- import pandas as pd import datetime import numpy as np import re from nltk.tokenize import word_tokenize from nltk.corpus import...

python pandas twitter nltk tokenize

Sibella asked 25/5, 2017 at 6:21

2

Solved

Tokenize by using regular expressions (parenthesis)

I have the following text: I don't like to eat Cici's food (it is true) I need to tokenize it to ['i', 'don't', 'like', 'to', 'eat', 'Cici's', 'food', '(', 'it', 'is', 'true', ')'] I have fo...

regex string split tokenize

Fetus asked 29/3, 2017 at 12:2

1

Solved

Document-term matrix in R - bigram tokenizer not working

I am trying to make 2 document-term matrices for a corpus, one with unigrams and one with bigrams. However, the bigram matrix is currently just identical to the unigram matrix, and I'm not sure why...

r tokenize tm n-gram rweka

Trichromatic asked 5/3, 2017 at 4:11

2

Solved

How to use stringstream to separate comma separated strings [duplicate]

I've got the following code: std::string str = "abc def,ghi"; std::stringstream ss(str); string token; while (ss >> token) { printf("%s\n", token.c_str()); } The...

c++tokenize stringstream

Undercroft asked 30/7, 2012 at 10:21

4

How to build a parse tree of a mathematical expression?

I'm learning how to write tokenizers, parsers and as an exercise I'm writing a calculator in JavaScript. I'm using a prase tree approach (I hope I got this term right) to build my calculator. I'm ...

parsing tokenize evaluation

Press asked 1/7, 2014 at 23:45

4

Solved

Looking for a clear definition of what a "tokenizer", "parser" and "lexers" are and how they are related to each other and used?

I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a pr...

parsing lexer tokenize

Solvency asked 19/12, 2008 at 9:14

2

XML / Java: Precise line and character positions whilst parsing tags and attributes?

I’m trying to find a way to precisely determine the line number and character position of both tags and attributes whilst parsing an XML document. I want to do this so that I can report accurately ...

java xml parsing tokenize sax

Hunger asked 31/1, 2017 at 22:2

5

Solved

How does a parser (for example, HTML) work?

For argument's sake lets assume a HTML parser. I've read that it tokenizes everything first, and then parses it. What does tokenize mean? Does the parser read every character each, building up a...

html browser parsing html-parsing tokenize

Lizethlizette asked 30/6, 2010 at 14:36

11

Solved

How split a file in words in unix command line?

I'm doing a faster tests for a naive boolean information retrival system, and I would like use awk, grep, egrep, sed or thing similiar and pipes for split a text file into words and save them into ...

unix command-line awk tokenize

Microfiche asked 19/3, 2013 at 14:3

4

Solved

Tokenize problem in Java with separator ". "

I need to split a text using the separator ". ". For example I want this string : Washington is the U.S Capital. Barack is living there. To be cut into two parts: Washington is the U.S Capital....

java string tokenize stringtokenizer

Pardon asked 4/6, 2010 at 7:23

1

Solved

How can I prevent spacy's tokenizer from splitting a specific substring when tokenizing a string?

How can I prevent spacy's tokenizer from splitting a specific substring when tokenizing a string? More specifically, I have this sentence: Once unregistered, the folder went away from the shell...

python nlp tokenize spacy

Confessional asked 26/1, 2017 at 3:26

3

Solved

Using Boost Tokenizer escaped_list_separator with different parameters

Hello i been trying to get a tokenizer to work using the boost library tokenizer class. I found this tutorial on the boost documentation: http://www.boost.org/doc/libs/1 _36 _0/libs/tokenizer/escap...

c++string boost tokenize

Sardanapalus asked 12/2, 2009 at 14:44

11

Solved

Tokenize a string with a space in java

I want to tokenize a string like this String line = "a=b c='123 456' d=777 e='uij yyy'"; I cannot split based like this String [] words = line.split(" "); Any idea how can I split so that I ...

java tokenize

Managerial asked 1/10, 2009 at 0:21

3

Solved

Python: Tokenizing with phrases

I have blocks of text I want to tokenize, but I don't want to tokenize on whitespace and punctuation, as seems to be the standard with tools like NLTK. There are particular phrases that I want to b...

python nlp tokenize nltk

Tugboat asked 3/4, 2011 at 20:42

2

Solved

elasticsearch custom tokenizer - split token by length

I am using elasticsearch version 1.2.1. I have a use case in which I would like to create a custom tokenizer that will break the tokens by their length up to a certain minimum length. For example, ...

elasticsearch lucene tokenize stringtokenizer analyzer

Trackless asked 8/2, 2015 at 16:55

1

Solved

What is the best way to write a syntax tokenizer/parser in C? [closed]

Background Information: I have a desire to make a programming language, knowing the tools to do so, I don't have any good examples on how to use them. I really do not want to use Flex or Biso...

c string parsing tokenize

Infrasonic asked 11/11, 2016 at 0:33

3

Indexing and Querying URLS in Solr

I have a database of URLs that I would like to search. Because URLs are not always written the same (may or may not have www), I am looking for the correct way to Index and Query urls. I've tried a...

url indexing solr tokenize querying

Sociable asked 13/1, 2011 at 18:59

tokenize Questions

Recommended topics

Hot tags