What is the difference between MarianMT and OpusMT?
Asked Answered
T

1

8

I'm currently comparing various pre-trained NMT models and can't help but wonder what the difference between MarianMT and OpusMT is. According to OpusMT's Github it is based on MarianMT. However in the Huggingface transformers implementation all pretrained MarianMT models start with "Helsinki-NLP/opus-mt". So I thought it was the same, but even though they're roughly the same size, they yield different translation results.

If someone could please shed some light on what the differences are I would be very thankful.

Tillandsia answered 15/12, 2021 at 17:23 Comment(0)
B
10

Marian is an open-source tool for training and serving neural machine translation, mostly developed at the University of Edinburgh, Adam Mickiewicz University in Poznań and at Microsoft. It is implemented in C++ and is heavily optimized for MT, unlike PyTorch-based Huggingface Transformers that aim for generality rather than efficiency in a specific use case.

The NLP group at the University of Helsinki trained many translation models using Marian on parallel data collected at Opus, and open-sourced those models. Later, they also did a conversion of the trained model into Huggingface Transformers and made them available via the Huggingface Hub.

MarianMT is a class in Huggingface Transformers for imported Marian models. You can train a model in Marian and convert it yourself. OpusMT models are Marian models trained on the Opus data in Helsinki converted to the PyTorch models. If you search the Huggingface Hub for Marian, you will find other MarianMT models than those from Helsinki.

Beckman answered 16/12, 2021 at 9:0 Comment(3)
Thank you for your response, this is also what I understood. But then how come that the model from the transformers package yields different translation results than the model on helsinki-NLPs github? If they're both MarianMT and they're both trained on the same Opus data they should give the same results, shouldn't they?Tillandsia
The beam search is probably implemented differently in Marian and Transformers and different default parameters of beam search. The easiest way to check if the models are the same is by trying greedy decoding (beam size 1, no length normalization).Bonham
Hm that might very well be the reason. As far as I see, there is no way to set the decoding in the huggingface transformer package though, at least if you use pre-trained models. Or am I missing something?Tillandsia

© 2022 - 2024 — McMap. All rights reserved.