Training a basic spacy text classification model
Asked Answered
M

1

6

I am trying to train a basic text classification model using spaCy. I have a list of texts and I want to build a model which will classify either text as outcome1 or outcome2. Let's say my data looks like this:

texts = ["This is the first example text",
         "This is the second example text",
         "This is yet another text"]
y = ["outcome2", "outcome1", "outcome1"]

My problem is, I have trouble even processing the texts into docs:

nlp = spacy.blank("en")

textcat = nlp.create_pipe("textcat")
textcat.add_label("outcome1")
textcat.add_label("outcome2")
textcat = nlp.add_pipe("textcat", last = True)

nlp.pipe_names
>>> ['textcat']

But when I try to process any text I get an error:

doc = nlp("This is a sentence")
>>> ValueError: Cannot get dimension 'nO' for model 'sparse_linear': value unset

I've tried to follow this tutorial (which is a bit outdated) and setup a project using the spaCy quickstart widget, but I keep running into errors when initialising the config file. What am I missing?

Malmo answered 23/7, 2021 at 13:40 Comment(2)
Instead, for spaCy v3 try this example project: github.com/explosion/projects/tree/v3/pipelines/textcat_demo . How to get started with a project: spacy.io/usage/projects or if you're coming from v2 examples: github.com/explosion/spaCy/tree/master/examplesKerin
Look at this example out on Kaggle. Search for TextCategorization once you navigate to this article kaggle.com/poonaml/text-classification-using-spacySkittish
F
1

Actually I found a very similar discussion here and it is exactly what this question is asking: https://github.com/explosion/spaCy/discussions/9732

The discussion states that you must specify labels, train the model and initialize it before it is usable. Additionally from version 3 onwards it is not recommended to train using your own training loop, but instead to use the new config system and let spacy handle the training for you. See: https://spacy.io/usage/training

Felicidad answered 28/11, 2021 at 12:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.