How to run huggingface Helsinki-NLP models
Asked Answered
P

3

5

I am trying to use the Helsinki-NLP models from huggingface, but I cannot find any instructions on how to do it. The README files are computer generated and do not contain explanations. Can some one point me to a getting started guide, or show an example of how to run a model like opus-mt-en-es?

Putrescible answered 20/11, 2021 at 5:27 Comment(0)
S
4

On the model's page here there's a Use in Transformers link that you can use to see the code to load it in their transformers package as shown below:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-es-en")

then use it as you would any transformer model:

inp = "Me llamo Wolfgang y vivo en Berlin"
input_ids = tokenizer(inp, return_tensors="pt").input_ids
outputs = model.generate(input_ids=input_ids, num_beams=5, num_return_sequences=3)
print("Generated:", tokenizer.batch_decode(outputs, skip_special_tokens=True))

Output:

Generated: ['My name is Wolfgang and I live in Berlin', 'My name is Wolfgang and I live in Berlin.', "My name's Wolfgang and I live in Berlin."]
Sudderth answered 20/11, 2021 at 7:43 Comment(0)
F
2

The fastest way to run the Helsinki-NLP models is with ctranslate2 library, as shown here. This is much faster than using the transformers library since ctranslate2 is optimized for speed.

Downloading the model:

ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de

Running in python:

import ctranslate2
import transformers

translator = ctranslate2.Translator("opus-mt-en-de")
tokenizer = transformers.AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")

source = tokenizer.convert_ids_to_tokens(tokenizer.encode("Hello world!"))
results = translator.translate_batch([source])
target = results[0].hypotheses[0]

print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target)))
Favorable answered 24/1 at 15:51 Comment(0)
S
1

To use on the fly, you can check the huggingFace course here. They provide pipelines that help you run this on the fly, consider:

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en")
translator("your-text-to-translate-here") 
Sputum answered 12/4, 2022 at 11:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.