Posts tagged with "AI"
3D Photo Inpainting with Python and PyTorch
A comprehensive guide to setting up and using 3D Photo Inpainting on Windows, including Miniconda environment setup, dependency installation, and usage instructions.
Published: May 12, 2023 | Last Modified: May 13, 2025OpenAI Whisper Speech Recognition Guide
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
GitHub Repository
Installation
pip install git+https://github.com/openai/whisper.git
Fix CUDA not detecting GPU
Whisper will default to the CPU if a GPU is not detected, which is considerably slower.pip uninstall torch
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
Example usage
# Transcribe
whisper input.mp3 --model medium.en --language en --task transcribe
# Translate
whisper japanese.wav --model large --language Japanese --task translate
Available models and languages
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
Published: February 24, 2023 | Last Modified: May 13, 2025MiDaS Depth Estimation Guide
GitHub Repository
During installation, I ran into an issue where the CUDA package wasn’t found. Had to modify environment.yaml to:
name: midas-py310
channels:
- pytorch
- defaults
dependencies:
- nvidia::cuda-toolkit=11.7.0
- python=3.10.8
- pytorch::pytorch=1.13.0
- torchvision=0.14.0
- pip=22.3.1
- numpy=1.23.4
- pip:
- opencv-python==4.6.0.66
- imutils==0.5.4
- timm==0.6.12
- einops==0.6.0
Commands that were helpful for troubleshooting CUDA:
Published: February 6, 2023 | Last Modified: May 13, 2025