Getting this error: AttributeError: 'GPT2Tokenizer' object has no attribute 'train_new_from_iterator'
Very similar to hugging face documentation. I changed the input and that's it (shouldn't affect it). It worked once. Came back to it 2 hrs later and it doesn't... nothing was changed NOTHING. Documentation states train_new_from_iterator only works with 'fast' tokenizers and that AutoTokenizer is supposed to pick a 'fast' tokenizer by default. My best guess is, it is having some trouble with this. I also tried downgrading transformers and reinstalling to no success. df is just one column of text.
from transformers import AutoTokenizer
import tokenizers
def batch_iterator(batch_size=10, size=5000):
for i in range(100): #2264
query = f"select note_text from cmx_uat.note where id > {i * size} limit 50;"
df = pd.read_sql(sql=query, con=cmx_uat)
for x in range(0, size, batch_size):
yield list(df['note_text'].loc[0:5000])[x:x + batch_size]
old_tokenizer = AutoTokenizer.from_pretrained('roberta')
training_corpus = batch_iterator()
new_tokenizer = old_tokenizer.train_new_from_iterator(training_corpus, 32000)