Posts tagged with "Audio"

OpenAI Whisper Speech Recognition Guide

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

GitHub Repository

Installation

pip install git+https://github.com/openai/whisper.git

Fix CUDA not detecting GPU

Whisper will default to the CPU if a GPU is not detected, which is considerably slower.

pip uninstall torch
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Example usage

# Transcribe
whisper input.mp3 --model medium.en --language en --task transcribe
# Translate
whisper japanese.wav --model large --language Japanese --task translate

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Published: February 24, 2023 | Last Modified: May 13, 2025

NVIDIA Maxine Windows Audio Effects SDK

A step-by-step workflow for cleaning poor audio recordings using NVIDIA Maxine Windows Audio Effects SDK with different sample rate configurations.

Published: April 20, 2022 | Last Modified: May 13, 2025

FFmpeg Command Reference

A comprehensive reference guide to FFmpeg commands and filters, covering audio/video processing, screen recording, format conversion, and advanced filtering techniques with detailed parameter explanations and practical examples.

Published: January 27, 2022 | Last Modified: May 13, 2025