Skip to content
Posts Feb 24, 2023 1 min read

OpenAI Whisper Speech Recognition Guide

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. GitHub Repository Installation pip install git+https://github.com/openai/whisper.git Fix CUDA not detecting GPU Whisper will default to the CPU if a GPU is not detected, which is considerably slower. pip uninstall torch pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 Example usage # Transcribe whisper input.mp3 --model medium.en --language en --task transcribe # Translate whisper japanese.wav --model large --language Japanese --task translate Available models and languages There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

OpenAI Whisper Speech Recognition Guide

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

GitHub Repository

Installation

pip install git+https://github.com/openai/whisper.git 

Fix CUDA not detecting GPU

Whisper will default to the CPU if a GPU is not detected, which is considerably slower.
pip uninstall torch
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Example usage

# Transcribe
whisper input.mp3 --model medium.en --language en --task transcribe
# Translate
whisper japanese.wav --model large --language Japanese --task translate

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

SizeParametersEnglish-only modelMultilingual modelRequired VRAMRelative speed
tiny39 Mtiny.entiny~1 GB~32x
base74 Mbase.enbase~1 GB~16x
small244 Msmall.ensmall~2 GB~6x
medium769 Mmedium.enmedium~5 GB~2x
large1550 MN/Alarge~10 GB1x

For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models.

Connected Reading

Related entries

Chosen from shared tags, categories, and nearby section context.

Discovery Layer

Connected Memory

A focused relationship view around this entry, using shared categories and tags.

Categories 0
Tags 0
Posts 0