Robust Speech Recognition via Large-Scale Weak Supervision
...These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
Video automatic transcribe and translated subtitle generator
It generates srt format subtitle from videofile which can be any source language that whisper support , and then make translated subtitle file of your target language which deepl support.
This is the subtitle generator(VATSG) which use [moviepy](https://github.com/Zulko/moviepy) to generate mp3 and then use [faster-whisper](https://github.com/guillaumekln/faster-whisper) to get text recognition and then use deepl-api to generate your target language subtitle file(srt format)
If you are a general user who want to view any video file and mp3 file to your language, It will provide way.
...