I am trying to use the Helsinki-NLP models from huggingface
, but
I cannot find any instructions on how to do it.
The README files are computer generated and do not contain explanations.
Can some one point me to a getting started guide, or show an example of how to run a model like opus-mt-en-es?
How to run huggingface Helsinki-NLP models
Asked Answered
On the model's page here there's a Use in Transformers link that you can use to see the code to load it in their transformers
package as shown below:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-es-en")
then use it as you would any transformer model:
inp = "Me llamo Wolfgang y vivo en Berlin"
input_ids = tokenizer(inp, return_tensors="pt").input_ids
outputs = model.generate(input_ids=input_ids, num_beams=5, num_return_sequences=3)
print("Generated:", tokenizer.batch_decode(outputs, skip_special_tokens=True))
Output:
Generated: ['My name is Wolfgang and I live in Berlin', 'My name is Wolfgang and I live in Berlin.', "My name's Wolfgang and I live in Berlin."]
The fastest way to run the Helsinki-NLP
models is with ctranslate2 library, as shown here. This is much faster than using the transformers
library since ctranslate2
is optimized for speed.
Downloading the model:
ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de
Running in python:
import ctranslate2
import transformers
translator = ctranslate2.Translator("opus-mt-en-de")
tokenizer = transformers.AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
source = tokenizer.convert_ids_to_tokens(tokenizer.encode("Hello world!"))
results = translator.translate_batch([source])
target = results[0].hypotheses[0]
print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target)))
To use on the fly, you can check the huggingFace course here. They provide pipelines that help you run this on the fly, consider:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en")
translator("your-text-to-translate-here")
© 2022 - 2024 — McMap. All rights reserved.