text-classification Questions

2

I am working on a binary classification problem in Weka with a highly imbalanced data set (90% in one category and 10% in the other). I first applied SMOTE (http://www.cs.cmu.edu/afs/cs/project/jai...
Cartel asked 6/8, 2015 at 12:52

1

Solved

I'm trying do build a text classification model with python and textblob, the script is runing on my server and in the future the idea is that users will be able to submit their text and it will be...
Darees asked 24/11, 2015 at 1:35

5

Solved

The winner of a recent Wikipedia vandalism detection competition suggests that detection could be improved by "detecting random keyboard hits considering QWERTY keyboard layout". Example: woijf qo...
Douglassdougy asked 27/9, 2010 at 8:41

2

Can anyone point me to some large corpus that I use for classification? But by large I don't mean Reuters or 20 newsgroups, I'm talking about a corpus of GB size, not 20MB or something like that. ...
Coherent asked 27/8, 2015 at 10:17

1

Solved

I want to convert text documents into feature vectors using tf-idf, and then train a naive bayes algorithm to classify them. I can easily load my text files without the labels and use HashingTF() ...

2

Solved

The following code run Naive Bayes movie review classifier. The code generate a list of the most informative features. Note: **movie review** folder is in the nltk. from itertools import chain ...
Leroi asked 27/3, 2015 at 13:34

1

Solved

I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Pyth...

1

Solved

I have just started to work on a Classification problem. Its a two class problem, My Trained model(Machine Learning) will have to decide/predict either to allow a URL or Block it. My Question is v...

1

Solved

I have a large corpus of opinions (2500) in raw text. I would like to use scikit-learn library to split them into test/train sets. What could be the best aproach to solve this task with scikit-lear...

3

Solved

I am using scikit-learn Multinomial Naive Bayes classifier for binary text classification (classifier tells me whether the document belongs to the category X or not). I use a balanced dataset to tr...

1

I am new to Python and to Stackoverflow(please be gentle) and am trying to learn how to do a sentiment analysis. I am using a combination of code I found in a tutorial and here: Python - AttributeE...
Vespertine asked 23/5, 2014 at 23:26

2

Solved

There are few dictionaries available for natural language processing. Like positive, negative words dictionaries etc. Is there any dictionary available which contains list of synonym for all dict...

3

I've seen a few questions on class imbalance in a multiclass setting. However, I have a multi-label problem, so how would you deal with it in this case? I have a set of around 300k text examples. ...

1

Solved

I am struggling to use Random Forest in Python with Scikit learn. My problem is that I use it for text classification (in 3 classes - positive/negative/neutral) and the features that I extract are ...

1

Solved

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the differen...

2

Solved

I'm new to text categorization techniques, I want to know the difference between the N-gram approach for text categorization and other classifier (decision tree, KNN, SVM) based text categorization...

1

I started using sklearn.naive_bayes.GaussianNB for text classification, and have been getting fine initial results. I want to use the probability returned by the classifier as a measure of confiden...
Barbaresi asked 5/8, 2013 at 14:5

1

Solved

So I've been working on a natural language processing project in which I need to classify different styles of writing. Assuming that semantic features from texts have already been extracted for me,...
Juicy asked 29/5, 2013 at 20:54

© 2022 - 2024 — McMap. All rights reserved.