OpenAI Whisper Speech Recognition Guide
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. GitHub Repository Installation pip install git+https://github.com/openai/whisper.git Fix CUDA not detecting GPU Whisper will default to the CPU if a GPU is not detected, which is considerably slower. pip uninstall torch pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 Example usage # Transcribe whisper input.mp3 --model medium.en --language en --task transcribe # Translate whisper japanese.wav --model large --language Japanese --task translate Available models and languages There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
