text-analysis Questions

3

I am using twitter API to generate sentiments. I am trying to generate a word-cloud based on tweets. Here is my code to generate a wordcloud wordcloud(clean.tweets, random.order=F,max.words=80,...
Melodize asked 28/11, 2017 at 5:31

3

I am trying to get the score of the best match using difflib.get_close_matches: import difflib best_match = difflib.get_close_matches(str,str_list,1)[0] I know of the option to add 'cutoff' par...
Coleen asked 29/3, 2016 at 11:47

4

If I have a text containing for example an article of a newspaper in Catalan language, how could I find all cities from that text? I have been looking at the package nltk for python and I have dow...
Groupie asked 10/5, 2015 at 10:0

5

Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below, from langdetect import detect import csv with open('C:\\Users\\dell\\Do...
Mackle asked 24/11, 2016 at 10:6

4

I don't understand type conversion. I know this isn't right, all I get is a bunch of hieroglyphs. f, _ := os.Open("test.pdf") defer f.Close() io.Copy(os.Stdout, f) I want to work with the strings...
Janijania asked 2/10, 2016 at 4:33

2

Solved

I have a classic NLP problem, I have to classify a news as fake or real. I have created two sets of features: A) Bigram Term Frequency-Inverse Document Frequency B) Approximately 20 Features ass...
Vindicate asked 1/2, 2018 at 23:2

4

Solved

Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It's because stemmers change ...
Mathews asked 26/6, 2013 at 10:19

2

I am facing the below error while working on the TM package with R. library("tm") Loading required package: NLP Warning messages: 1: package ‘tm’ was built under R version 3.4.2 2: package ‘NLP’...
Sphenoid asked 21/11, 2017 at 6:27

3

Solved

I'm doing text analysis over reddit comments, and I want to calculate the TF-IDF within BigQuery.
Ferous asked 31/10, 2017 at 5:42

2

Solved

I have a huge dataset which is similar to the columns posted below NameofEmployee <- c(x, y, z, a) Region <- c("Pune", "Orissa", "Orisa", "Poone") As you can see, in the Region colum...
Rapture asked 24/7, 2018 at 6:28

2

I have been using JJ Allaire's guide to using word embeddings in neural network model for text processing (https://jjallaire.github.io/deep-learning-with-r-notebooks/notebooks/6.1-using-word-embedd...
Watcher asked 4/5, 2018 at 23:32

3

Solved

I'm trying to do some text analysis to determine if a given string is... talking about politics. I'm thinking I could create a neural network where the input is either a string or a list of words (...
Blinders asked 5/5, 2016 at 6:10

2

Solved

If I use the TfidfVectorizer from sklearn to generate feature vectors as: features = TfidfVectorizer(min_df=0.2, ngram_range=(1,3)).fit_transform(myDocuments) How would I then generate feature ve...
Lionel asked 18/10, 2016 at 15:32

1

Solved

I'm thinking about use word n-grams techniques on a raw text. But I have a doubt: does it have sense use word n-grams after applying lemma/stemming on text? If not, why should I use word n-grams o...

1

I'm a new-bee to AI and want to perform the below exercise. Can you please suggest the way to achieve it using python: Scenario - I have list of businesses of some companies as below like: 1. AI...

2

Solved

I am using mallet topic modelling sample code and though it runs fine, I would like to know what the parameters of this statement actually mean? instances.addThruPipe(new CsvIterator(new FileReade...

3

I'm trying to model twitter stream data with topic models. Gensim, being an easy to use solution, is impressive in it's simplicity. It has a truly online implementation for LSI, but not for LDA. Fo...
Quinine asked 18/3, 2014 at 2:52

1

Solved

I want to convert this matrix into a pandas dataframe. csc_matrix The first number in the bracket should be the index, the second number being columns and the number in the end being the data. I ...
Apnea asked 13/4, 2016 at 2:53

1

Solved

Here is my code: from sklearn.svm import SVC from sklearn.grid_search import GridSearchCV from sklearn.cross_validation import KFold from sklearn.feature_extraction.text import TfidfVectorizer fro...
Corridor asked 13/2, 2016 at 11:18

3

Solved

Here's an appeal for a better way to do something that I can already do inefficiently: filter a series of n-gram tokens using "stop words" so that the occurrence of any stop word term in an n-gram ...
Rakes asked 12/10, 2015 at 0:9

1

Solved

I am doing some text analysis work in Python. Unfortunately, I need to switch to R in order to use a particular package (unfortunately, the package cannot be replicated in Python easily). Current...
Suppletory asked 5/6, 2015 at 21:15

1

Solved

I have been breaking my head over this one over the last few days. I searched all the SO archives and tried the suggested solutions but just can't seem to get this to work. I have sets of txt docum...
Mclain asked 9/11, 2014 at 23:30

3

Solved

I have a PDF file with valuable textual information. The problem is that I cannot extract the text, all I get is a bunch of garbled symbols. The same happens if I copy and paste the text fro...
Yeta asked 29/8, 2012 at 18:30

3

This is a Homework question. I have a huge document full of words. My challenge is to classify these words into different groups/clusters that adequately represent the words. My strategy to deal wi...
Litter asked 7/12, 2012 at 18:53

2

Solved

How do I use sklearn CountVectorizer with both 'word' and 'char' analyzer? http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html I could extract the...

© 2022 - 2024 — McMap. All rights reserved.