I am trying to train a basic text classification model using spaCy. I have a list of texts and I want to build a model which will classify either text as outcome1
or outcome2
. Let's say my data looks like this:
texts = ["This is the first example text",
"This is the second example text",
"This is yet another text"]
y = ["outcome2", "outcome1", "outcome1"]
My problem is, I have trouble even processing the texts into docs:
nlp = spacy.blank("en")
textcat = nlp.create_pipe("textcat")
textcat.add_label("outcome1")
textcat.add_label("outcome2")
textcat = nlp.add_pipe("textcat", last = True)
nlp.pipe_names
>>> ['textcat']
But when I try to process any text I get an error:
doc = nlp("This is a sentence")
>>> ValueError: Cannot get dimension 'nO' for model 'sparse_linear': value unset
I've tried to follow this tutorial (which is a bit outdated) and setup a project using the spaCy quickstart widget, but I keep running into errors when initialising the config file. What am I missing?