I noticed that transcribing speech in multiple languages with openai whisper speech-to-text library sometimes accurately recognizes inserts in another language and would provide the expected output, for example: 八十多个人 is the same as 八十几个人. So 多 and 几 are interchangeable and they can both mean several
.
Yet, the same audio input on a different pass (with the same model, or a smaller/bigger model) would intermittently result in glitches where the entire sentence is being translated rather than transcribed. I.e. a fragment would be translated either into the first or the second language that appears in the audio. With the example input above either the entire sentence would be in English (with Chinese bits translated to English), or the entire sentence would be in Chinese (with the English bits translated to Chinese). Important: in both cases no input language was specified, and no task type was passed (which implies the default --task transcribe
).
The docs for whisper mention translation to English as the only available target language (with the option --task translate
in the command line version), but there is no mention of translating to other target languages. Yet the behavior mentioned above indicates that the models are capable of doing translation to other languages too.
The question is if there is a known way to configure the models to do just text-to-text translation? Or is the behavior just some sort of glitch that is not something that can be 'exploited' or configured on a lower level that would allow using the models just for text translation between any of the supported languages?