Posts tagged with "Machine Learning"

Window Monitoring with Machine Vision

A guide to setting up a window monitoring system using the Moondream2 machine vision model for real-time window content analysis.

Published: March 25, 2024 | Last Modified: May 13, 2025

OpenAI Whisper Speech Recognition Guide

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

GitHub Repository

Installation

pip install git+https://github.com/openai/whisper.git 

Fix CUDA not detecting GPU

Whisper will default to the CPU if a GPU is not detected, which is considerably slower.
pip uninstall torch
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Example usage

# Transcribe
whisper input.mp3 --model medium.en --language en --task transcribe
# Translate
whisper japanese.wav --model large --language Japanese --task translate

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Published: February 24, 2023 | Last Modified: May 13, 2025

Frame Interpolation Large Motion (FILM)

A comprehensive guide to setting up and using Google’s Frame Interpolation Large Motion (FILM) TensorFlow implementation, enabling the creation of smooth animations by generating intermediate frames between existing images using deep learning techniques.

Published: February 5, 2023 | Last Modified: May 13, 2025