How can I give some hint phrases to OpenAI's Whisper ASR?
Asked Answered
H

1

5

I use OpenAI's Whisper python lib for speech recognition. How can I give some hint phrases, as it can be done with some other ASR such as Google?


To transcribe with OpenAI's Whisper (tested on Ubuntu 20.04 x64 LTS with an Nvidia GeForce RTX 3090):

conda create -y --name whisperpy39 python==3.9
conda activate whisperpy39
pip install git+https://github.com/openai/whisper.git 
sudo apt update && sudo apt install ffmpeg
whisper recording.wav
whisper recording.wav --model large

If using an Nvidia GeForce RTX 3090, add the following after conda activate whisperpy39:

pip install -f https://download.pytorch.org/whl/torch_stable.html
conda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch
Hom answered 24/9, 2022 at 0:4 Comment(0)
H
7

2 potential places for hint phrases / boost:

  1. https://github.com/openai/whisper/blob/15ab54826343c27cfaf44ce31e9c8fb63d0aa775/whisper/decoding.py#L87-L88: add hint phrases in the prompt (and not in prefix: see this discussion on prompt vs. prefix. There's a new --initial_prompt option since commit 2037b65:

    whisper audio.mp3 --initial\_prompt "So we were just talking about DALL·E"
    
  2. https://github.com/openai/whisper/blob/15ab54826343c27cfaf44ce31e9c8fb63d0aa775/whisper/decoding.py#L302: change the code to increase the likelihood of the sequences containing the hint phrases, e.g.:

    Currently there's no interface for this other than giving the initial_prompt like the above; you could hack something with logit biasing, that effectively boosts the predicted probability of certain tokens. The LogitFilter class is designed to support this.

I don't know how efficient it'd be. Also, one potential issue arises when the hint word is not in the dictionary, in which case one would need to add the hint word in the dictionary, which may be difficult.

Hom answered 24/9, 2022 at 17:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.