Sections

  • Home
  • Archive
  • LLM Prompts
  • Posts

BTC

Bitcoin QR Code

Recently Modified

  • SMTP Test With PowerShell on 2025-07-02
  • Videos as Teams Backgrounds on 2025-07-02
  • UDM Parameters for Google Search on 2025-06-18
  • Troubleshoot Crashing Apps with ProcDump & WinDbg on 2025-05-01
  • Stub Title on 2025-03-07
  • Automated IIS Application Pool Restart with PowerShell on 2024-10-16
  • Managing Microsoft Office Versions with OfficeC2RClient on 2024-09-10
  • Automated Batch Image Compression with Python on 2024-07-30
  • How to Find Your Public IP Address on 2024-06-24
  • Complete DNS Records Reference Guide on 2024-06-06

Window Monitoring with Machine Vision

Published: March 25, 2024 | Last Modified: May 13, 2025

Tags: python machine-learning computer-vision monitoring moondream

Categories: Python Machine Learning



  • Server-Side
    • Setup Environment
    • Server-Side Script
  • Client-Side
    • Client-Side Script

Github Repository: Window Watcher

Server-Side

Setup Environment

I am using the moondream2 model for fast inference.

Model Repository: Moondream on GitHub

Take the server.py script from this repo: Github Repository: Window Watcher

git clone https://github.com/vikhyat/moondream.git
cd moondream
python -m venv venv
venv\Scripts\activate
cd ../
pip install -r requirements.txt
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip install flask

Confirm CUDA is working, or else the script will fall back onto the GPU.

nvcc --version

Server-Side Script

from flask import Flask, request, jsonify
from PIL import Image
from io import BytesIO

import argparse
import torch
import re
import time
import gradio as gr
from moondream import detect_device, LATEST_REVISION
from threading import Thread
from transformers import TextIteratorStreamer, AutoTokenizer, AutoModelForCausalLM

app = Flask(__name__)

parser = argparse.ArgumentParser()
parser.add_argument("--cpu", action="store_true")
args = parser.parse_args()

if args.cpu:
    device = torch.device("cpu")
    dtype = torch.float32
else:
    device, dtype = detect_device()
    if device != torch.device("cpu"):
        print("Using device:", device)
        print("If you run into issues, pass the `--cpu` flag to this script.")
        print()
        
# Initialize the model
model_id = "vikhyatk/moondream2"
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=LATEST_REVISION)
moondream = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, revision=LATEST_REVISION
).to(device=device, dtype=dtype)
moondream.eval()

@app.route('/itt', methods=['POST'])
def get_answer():
    if 'image' not in request.files or 'prompt' not in request.form:
        return jsonify({"error": "Missing image file or prompt"}), 400

    image_file = request.files['image']
    prompt = request.form['prompt']

    image = Image.open(BytesIO(image_file.read()))

    # Ensure image size is optimal for the model
    # image = image.resize((optimal_width, optimal_height))

    image_embeds = moondream.encode_image(image)

    answer = moondream.answer_question(image_embeds, prompt, tokenizer)
    
    return jsonify({"text": answer})

if __name__ == "__main__":
    # Disable debug for production
    app.run(debug=True)

Client-Side

POST http://127.0.0.1:5000/itt

Client-Side Script

Categories

  • Active Directory (4)
  • AI (3)
  • Azure AD (1)
  • C# (2)
  • C++ (1)
  • Computer Vision (1)
  • DNS (1)
  • Exchange (2)
  • Google (1)
  • Image Processing (2)
  • Java (32)
  • JavaScript (17)
  • Machine Learning (3)
  • MASM (3)
  • Media Processing (1)
  • Microsoft 365 (2)
  • Microsoft Office (1)
  • Microsoft Teams (1)
  • Networking (4)
  • Nodejs (1)
  • Office 365 (1)
  • P5.js (9)
  • PowerShell (25)
  • Processing (14)
  • Programming (1)
  • Python (19)
  • Reference (1)
  • Security (8)
  • Shell (16)
  • Stub (1)
  • System Administration (4)
  • Teams (1)
  • Visualization (1)
  • Web Administration (1)
  • Web Development (2)
  • Windows (9)

Tags

  • 10PRINT (1)
  • 3d-Modeling (1)
  • 3n+1 (1)
  • Account Management (1)
  • Acl (1)
  • Active-Directory (10)
  • Ad Sync (1)
  • Ai (9)
  • Android (1)
  • Animation (10)
  • Api (2)
  • Arrays (1)
  • Assembly (3)
  • Audio (3)
  • Audio Conversion (1)
  • Automation (13)
  • Azure (4)
  • Azure Ad Connect (1)
  • Base64 (1)
  • Bat (2)
  • Batch-Processing (1)
  • Bipartite Graph (1)
  • Bitset (1)
  • Buddhabrot (1)
  • Calendars (1)
  • Channel Management (1)
  • Client-Side (1)
  • Cmd (1)
  • Coding Challenge (15)
  • Collaboration (1)
  • Collatz Conjecture (1)
  • Command-Line (6)
  • Compliance (1)
  • Computer-Vision (3)
  • Coqui-Tts (1)
  • Counting Sort (1)
  • Creative-Coding (1)
  • Cuda (2)
  • Curl (1)
  • Cybersecurity (1)
  • Dag (1)
  • Data-Visualization (5)
  • Debugging (1)
  • Decoding (1)
  • Depth Estimation (1)
  • Device Management (1)
  • Directed Acyclic Graph (1)
  • Directory-Services (1)
  • Disjoint Set (1)
  • Distance (1)
  • Dkim (1)
  • Dmarc (1)
  • Dns (2)
  • Domain (1)
  • Domain Controller (1)
  • Domain Security (1)
  • Domain-Management (1)
  • Download (2)
  • Drivers (1)
  • Drives (1)
  • Education (1)
  • Email (1)
  • Email Management (1)
  • Email Security (2)
  • Email-Archiving (1)
  • Events (1)
  • Exchange (1)
  • Exchange-Management (1)
  • Exchange-Online (2)
  • ExchangeOnlineManagement (3)
  • Ffmpeg (4)
  • Fibonacci (1)
  • File-Permissions (1)
  • File-System (1)
  • Film (1)
  • Filtering (1)
  • Finance (1)
  • Firewall (1)
  • Flask (1)
  • Fractal (3)
  • Frame Interpolation (1)
  • Gal (1)
  • Gmail (1)
  • Google Forms (1)
  • Google-Apps-Script (1)
  • Google-Drive (1)
  • Gpu (1)
  • Graphs (1)
  • Group (1)
  • Group Management (1)
  • Group-Policy (1)
  • Gsuite (1)
  • Hacked Accounts (1)
  • Hardware (1)
  • Hex Encoding (1)
  • Iis (1)
  • Image-Processing (4)
  • Images (3)
  • Incident Response (1)
  • Insertion Sort (1)
  • Installation (1)
  • Interactive (9)
  • Ip-Address (1)
  • Ip-Addressing (1)
  • Java (1)
  • Javascript (5)
  • Juno (1)
  • Jupiter (1)
  • K-Means (1)
  • Kattis (6)
  • Keyboard (1)
  • Knowledge-Graphs (1)
  • Kruskal's Algorithm (1)
  • Lan (1)
  • Llm (2)
  • Local Administrator (1)
  • Local-Ai (2)
  • Logging (1)
  • Lorenz System (1)
  • M365 (3)
  • Machine-Learning (6)
  • Maximum Flow (1)
  • Media Processing (1)
  • Merge Sort (1)
  • Microsoft Teams (1)
  • Microsoft-Office (1)
  • Midas (1)
  • Minimum Spanning Tree (2)
  • Mistral-7b (1)
  • Monitoring (1)
  • Moondream (2)
  • Multilingual (1)
  • Mx (1)
  • N-Central (1)
  • Natural Language Processing (1)
  • Net (2)
  • Netsh (2)
  • Network (1)
  • Network Drives (1)
  • Network-Analysis (1)
  • Network-Security (2)
  • Networking (5)
  • Networkx (1)
  • Nlp (1)
  • Nslookup (1)
  • Obfuscation (1)
  • Office-365 (2)
  • Office365 (1)
  • Officec2rclient (1)
  • Open Simplex Noise (3)
  • Openai (1)
  • Optimization (1)
  • P5.js (2)
  • P5js (1)
  • Password Management (2)
  • Password-Generator (1)
  • Passwords (2)
  • Perlin Noise (1)
  • Permissions (2)
  • Phishing (1)
  • Photo-Editing (1)
  • Pil (1)
  • Pillow (1)
  • Port-Management (1)
  • Powershell (22)
  • Prim's Algorithm (1)
  • Prime Numbers (3)
  • Printers (1)
  • Procdump (1)
  • Processing (2)
  • Programming (2)
  • Python (15)
  • Python-Script (1)
  • Pyvis (2)
  • Qr-Code (1)
  • Rdp (1)
  • Reference (1)
  • Registry Modification (1)
  • Remote-Access (1)
  • Reporting (1)
  • Reports (3)
  • Robocopy (1)
  • Screen Recording (1)
  • Scripting (1)
  • SDK (1)
  • Security (9)
  • Security Analysis (1)
  • Security Management (1)
  • Settings (1)
  • Shell (4)
  • SID (1)
  • SMTP (2)
  • Sorting (3)
  • Sound (1)
  • Space (1)
  • Speech Recognition (1)
  • Spf (1)
  • Spiral (1)
  • Stable-Diffusion (2)
  • Stocks (1)
  • String (1)
  • Stub (1)
  • Subnets (1)
  • Synchronization (1)
  • Sysinternals (1)
  • System-Administration (13)
  • Systeminfo (1)
  • Team Management (1)
  • Team Ownership (1)
  • Tensorflow (1)
  • Therafit (1)
  • Time (1)
  • Topological Sort (1)
  • Troubleshooting (4)
  • Tzutil (1)
  • UDM (1)
  • Uri (1)
  • Uri Encoding (1)
  • User-Management (4)
  • Uva (9)
  • VBScript (1)
  • Version-Management (1)
  • Video (5)
  • Video Conversion (1)
  • Visualization (3)
  • Web-Administration (1)
  • Web-Development (1)
  • Wifi (1)
  • Win32_OperatingSystem (1)
  • Windbg (1)
  • Windows (17)
  • Windows 10 (1)
  • Windows 11 (1)
  • Windows-Defender (1)
  • Windows-Server (1)
  • Windows-Update (1)
  • Wmic (1)
  • Youtube-Dl (1)
  • Yt-Dlp (2)

© 2025 Ghostfeed theme by Tristan Madden. All rights reserved.