text-classification - 2

1

How to use Hugging Face Transformers library in Tensorflow for text classification on custom data?

I am trying to do binary text classification on custom data (which is in csv format) using different transformer architectures that Hugging Face 'Transformers' library offers. I am using this Tenso...

python tensorflow text-classification huggingface-transformers

Flowerdeluce asked 30/1, 2020 at 4:10

3

adding words to stop_words list in TfidfVectorizer in sklearn

I want to add a few more words to stop_words in TfidfVectorizer. I followed the solution in Adding words to scikit-learn's CountVectorizer's stop list . My stop word list now contains both ...

python scikit-learn classification stop-words text-classification

Monohydric asked 9/11, 2014 at 7:24

2

Solved

No batch_size while making inference with BERT model

I am working on a binary classification problem with Tensorflow BERT language model. Here is the link to google colab. After saving and loading the model is trained, I get error while doing the pre...

python tensorflow machine-learning deep-learning text-classification

Ommatophore asked 2/7, 2019 at 6:6

1

Generating dictionaries to categorize tweets into pre-defined categories using NLTK

I have a list of twitter users (screen_names) and I need to categorise them into 7 pre-defined categories - Education, Art, Sports, Business, Politics, Automobiles, Technology based on thier intere...

python machine-learning nlp nltk text-classification

Representative asked 23/2, 2020 at 6:5

1

why take the first hidden state for sequence classification (DistilBertForSequenceClassification) by HuggingFace

In the last few layers of sequence classification by HuggingFace, they took the first hidden state of the sequence length of the transformer output to be used for classification. hidden_state = d...

time-series sequence tensorflow2.0 text-classification huggingface-transformers

Hobbism asked 6/2, 2020 at 4:10

2

InvalidArgumentError: 2 root error(s) found. Incompatible shapes in Tensorflow text-classification model

I am trying to get code working from the following repo, which is based off this paper. It had a lot of errors, but I mostly got it working. However, I keep getting the same problem and I really do...

python tensorflow deep-learning nlp text-classification

Freeboard asked 22/1, 2020 at 20:42

1

BERT binary Textclassification get different results every run

I do binary text classification with BERT from the Simpletransformer. I work in Colab with GPU runtime type. I have generated train and test set with the sklearn StratifiedKFold Method. I have t...

python-3.x text-classification transformer-model bert-language-model

Evidently asked 26/12, 2019 at 9:49

1

Unable to train my keras model : (Data cardinality is ambiguous:)

I am using the bert-for-tf2 library to do a Multi-Class Classification problem. I created the model but training throws the following error: -------------------------------------------------------...

machine-learning nlp text-classification tensorflow2.0 tf.keras

Garnetgarnett asked 3/12, 2019 at 11:4

2

Solved

Best machine learning approach to automate text/fuzzy matching

I'm reasonably new to machine learning, I've done a few projects in python. I'm looking for advice on how to approach the below problem which I believe could be automated. A user in a data quality...

machine-learning text-classification fuzzy-comparison record-linkage

Haymo asked 16/2, 2017 at 16:40

1

Keras Embedding Layer: keep zero-padded values as zeros

I've been thinking about 0-padding of word sequence and how that 0-padding is then converted to the Embedding layer. At first glance, one would think that you want to keep the embeddings = 0.0 as w...

machine-learning keras text-classification word-embedding zero-padding

Borosilicate asked 27/6, 2019 at 20:51

3

Select top n TFIDF features for a given document

I am working with TFIDF sparse matrices for document classification and want to retain only the top n (say 50) terms for each document (ranked by TFIDF score). See EDIT below. import numpy as np i...

python scikit-learn sparse-matrix text-classification tf-idf

Baelbeer asked 24/10, 2018 at 15:7

1

Finetuning BERT on custom data

I want to train a 21 class text classification model using Bert. But I have very little training data, so a downloaded a similar dataset with 5 classes with 2 million samples.t And finetuned downl...

tensorflow deep-learning nlp text-classification bert-language-model

Well asked 4/5, 2019 at 5:40

2

Solved

Naive Bayes in Quanteda vs caret: wildly different results

I'm trying to use the packages quanteda and caret together to classify text based on a trained sample. As a test run, I wanted to compare the build-in naive bayes classifier of quanteda with the on...

r r-caret text-classification supervised-learning quanteda

Invalidism asked 29/1, 2019 at 17:57

4

How to deal with length variations for text classification using CNN (Keras)

It has been proved that CNN (convolutional neural network) is quite useful for text/document classification. I wonder how to deal with the length differences as the lengths of articles are differen...

nlp deep-learning text-classification keras

Secondary asked 2/6, 2016 at 1:40

3

Solved

Text classification beyond the keyword dependency and inferring the actual meaning

I am trying to develop a text classifier that will classify a piece of text as Private or Public. Take medical or health information as an example domain. A typical classifier that I can think of c...

python text-classification nlp

Kristikristian asked 4/3, 2019 at 22:0

1

Solved

How to recognize entities in text that is the output of optical character recognition (OCR)?

I am trying to do multi-class classification with textual data. Problem I am facing that I have unstructured textual data. I'll explain the problem with an example. consider this image for example:...

nlp recurrent-neural-network text-classification named-entity-recognition named-entity-extraction

Condone asked 3/3, 2019 at 10:52

1

Solved

Intent classification with large number of intent classes

I am working on a data set of approximately 3000 questions and I want to perform intent classification. The data set is not labelled yet, but from the business perspective, there's a requirement of...

python tensorflow nlp text-classification

Insurrectionary asked 24/2, 2019 at 9:50

3

Solved

Which algorithms to use for one class classification?

I have over 15000 text docs of a specific topic. I would like to build a language model based on the former so that I can present to this model new random text documents of various topics and the a...

scikit-learn text-classification

Nipa asked 23/10, 2013 at 20:40

1

Solved

How to show topics of reuters dataset in Keras?

I use reuters dataset in Keras. And I want to know the 46 topics' names. How can I show topics of reuters dataset in Keras? https://keras.io/datasets/#reuters-newswire-topics-classification

deep-learning keras text-classification

Burnoose asked 17/7, 2017 at 7:27

1

Solved

How to resample text (imbalanced groups) in a pipeline?

I'm trying to do some text classification using MultinomialNB, but I'm running into problems because my data is unbalanced. (Below is some sample data for simplicity. In actuality, mine is much lar...

python pipeline text-classification resampling oversampling

Hamitosemitic asked 9/1, 2019 at 20:45

1

Solved

How can I get around Keras pad_sequences() rounding float values to zero?

So I have a text classification model built with Keras. I've been trying to pad my varying length sequences but the Keras function pad_sequences() has just returned zeros. I've figured out that if...

python numpy keras lstm text-classification

Atone asked 3/1, 2019 at 23:21

1

Solved

How to handle text classification problems when multiple features are involved

I am working on a text classification problem where multiple text features and need to build a model to predict salary range. Please refer the Sample dataset Most of the resources/tutorials deal wi...

python nlp feature-extraction text-classification

Darton asked 26/12, 2018 at 7:56

2

Solved

GridSearchCV: How to specify test set?

I have a question regarding GridSearchCV: by using this: gs_clf = GridSearchCV(pipeline, parameters, n_jobs=-1, cv=6, scoring="f1") I specify that k-fold cross-validation should be used with 6 ...

python scikit-learn cross-validation text-classification

Pathological asked 11/11, 2016 at 10:37

2

How to do Text classification using word2vec

I want to perform text classification using word2vec. I got vectors of words. ls = [] sentences = lines.split(".") for i in sentences: ls.append(i.split()) model = Word2Vec(ls, min_count=1, size ...

python-3.x word2vec gensim text-classification

Vandyke asked 4/4, 2018 at 6:10

4

How can a machine learning model handle unseen data and unseen label?

I am trying to solve a text classification problem. I have a limited number of labels that capture the category of my text data. If the incoming text data doesn't fit any label, it is tagged as 'Ot...

machine-learning scikit-learn nlp text-classification naivebayes

Empathy asked 17/9, 2018 at 16:15

text-classification Questions

Recommended topics

Hot tags