I am trying to customize Spacy's NER to identify Indian names. Following this guide https://spacy.io/usage/training and this is the dataset I am using https://gist.githubusercontent.com/mbejda/9b93c7545c9dd93060bd/raw/b582593330765df3ccaae6f641f8cddc16f1e879/Indian-Female-Names.csv
As per the code , I am supposed to provide training data in following format:
TRAIN_DATA = [
('Shivani', {
'entities': [(0, 6, 'PERSON')]
}),
('Isha ', {
'entities': [(0,3 , 'PERSON')]
})
]
How do I provide training data to Spacy for ~12000 names as manually specifying each entity will be a chore? Is there any other tool available to tag all the names ?
csv.reader
to read each row, create a tuple with(name, {'entities': [(x, y, 'PERSON')]})
or whatever the values are, append it toTRAIN_DATA
. There's nothing particularly complicated here, but if you try it and get stuck somewhere, you can show us your code and where it's doing something wrong. – Nacre