Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
Read MoreThis is my tentative workflow for cleaning-up poor audio using the NVIDIA Maxine Windows Audio Effects SDK.
Read Moreffmpeg is a complete, cross-platform solution to record, convert and stream audio and video.
ffmpeg Download
ffmpeg Documentation
Audio Processing
Convert to 8kHz, single-channel PCM
1ffmpeg -i "input.mp3" -ar 8000 -ac 1 output.wav
Convert to 16kHz, single-channel PCM
1ffmpeg -i "input.mp3" -ar 16000 …
Read More