On Youtube, I can download the CC transcript for a video but the transcript does not contain punctuation. How can I punctuate the transcript automatically?
This is a problem studied in Natural Language Processing (NLP), which is often referred to as punctuation restoreation. There are some deep learning solutions that can achieve this, but they aren't perfect, although they can achieve decent results. You can try using https://github.com/ottokart/punctuator2, which is based on this paper. (you can try it out here).
In 2023 there are multiple ways to do it:
- Use chatGPT. It works very well but because of limits on input text it's quite a cumbersome process for long videos (60min+). Apart from processing batches you have to control output quality for each batch as it is not 100% consistent yet.
- Use Deep Multilingual Punctuation Prediction. It can restore the punctuation with accuracy 77% for English text. But it won't fix capital letters.
- Use yt-dlp and Whisper. Download mp3 from Youtube and run Whisper. This OpenAI's model does very good speech-to-text and provides output with punctuation. But it's quite slow for long video/audio (processing 60 mins audio takes approx 30 mins). Example implementation
- Use yt-dlp and whisper.cpp. This works faster, processing 60 mins audio takes less than 10 mins. My example implementation
- Use Shoki.app
There's no way to get them from youtube, you'll have to generate them yourself. Google offers a service that generates punctuation for arbitrary text, and from my personal experience, it's more accurate than some competitors, so I would run it through that.
You can use a DistilBERT token classifier to restore punctuations and uppercases. I use this approach for https://www.appblit.com/scribe and it works reasonably well. For other languages we would need to fine-tune a multilingual DistilBERT.
© 2022 - 2024 — McMap. All rights reserved.