# windows-ocr **Repository Path**: geminga/windows-ocr ## Basic Information - **Project Name**: windows-ocr - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-06-07 - **Last Updated**: 2026-06-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Window OCR - Real-time window text recognition A Windows real-time window OCR tool based on **PaddleOCR PP-OCRv5**, capable of capturing the **left region** of a specified window and performing text recognition. --- ## Features - Real-time capture of specified window (supports background/minimized windows) - Configurable relative region (default: left 40% of window) - Uses PaddleOCR PP-OCRv5 model, supports Chinese/English/Japanese/Traditional/Pinyin - Hotkey pause/resume (F9) and exit (F10) - Results printed to console and saved to log file --- ## Requirements | Item | Requirement | |------|-------------| | OS | Windows 10 / 11 | | Python | 3.10 ~ 3.12 | | RAM | 8GB+ recommended | | GPU | CPU works, NVIDIA GPU accelerates | --- ## Quick Start ### 1. Install uv (if not installed) ```powershell # PowerShell powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # Or update uv uv self update ``` ### 2. Enter project directory ```powershell cd window_ocr ``` ### 3. Create virtual environment and install dependencies ```powershell # Use project's specified Python version uv sync # Or force reinstall (if dependencies have issues) uv sync --reinstall ``` > First-time installation of `paddlepaddle` will auto-download ~100MB wheel. PaddleOCR models auto-download on first run (~30MB). ### 4. Configure target window Edit `src/window_ocr/config.py`, modify `TARGET_WINDOW`: ```python TARGET_WINDOW = { "title": "Your Window Title", # Window title keyword (fuzzy match) "class_name": None, # Window class name (optional, exact match) } ``` **How to find window title:** ```powershell # After activating venv, run helper script uv run python -c "import src.window_ocr.capture as cap; cap.list_windows()" ``` ### 5. Start OCR ```powershell # Method 1: Direct run (recommended) uv run python -m window_ocr # Method 2: Use src path uv run python src/window_ocr/main.py ``` After startup, the program will: 1. Find the target window 2. Calculate left region (default 40% window width) 3. Screenshot every 500ms and recognize text 4. Print results to console --- ## Debug Guide ### List all windows ```powershell uv run python -c " from window_ocr.capture import list_windows list_windows() " ``` ### Test if screenshot region is correct ```powershell # Generate debug screenshot to verify captured region uv run python -c " from window_ocr.capture import WindowCapture from window_ocr.config import TARGET_WINDOW, REGION_CONFIG import time cap = WindowCapture(TARGET_WINDOW['title']) time.sleep(1) img = cap.capture_region( REGION_CONFIG['left_rel'], REGION_CONFIG['top_rel'], REGION_CONFIG['width_rel'], REGION_CONFIG['height_rel'] ) img.save('debug_capture.png') print('Screenshot saved to debug_capture.png') " ``` ### Test OCR engine standalone ```powershell uv run python -c " from window_ocr.ocr_engine import OCREngine from PIL import Image engine = OCREngine() result = engine.recognize(Image.open('debug_capture.png')) print(result) " ``` ### Adjust recognition region Edit `REGION_CONFIG` in `src/window_ocr/config.py`: ```python REGION_CONFIG = { "left_rel": 0.0, # Relative offset from left (0.0 = leftmost) "top_rel": 0.0, # Relative offset from top (0.0 = topmost) "width_rel": 0.4, # Relative width ratio (0.4 = 40%) "height_rel": 1.0, # Relative height ratio (1.0 = 100%) } ``` All values are **0.0 ~ 1.0** relative ratios, automatically adapting to window size changes. ### Common Errors | Error | Cause | Solution | |-------|-------|----------| | `Window not found: xxx` | Title mismatch | Use `list_windows()` to check actual title | | `OCR result empty` | Wrong region / small text | Adjust `REGION_CONFIG` or zoom window | | `paddle import failed` | Incomplete install | `uv sync --reinstall` | | `Model download slow` | Network issue | Set env `PADDLE_HOME` to local model dir | | `Black screenshot` | Window minimized / DPI scaling | Keep window visible, check system scaling | ### Log level adjustment In `config.py`: ```python LOG_LEVEL = "DEBUG" # DEBUG / INFO / WARNING ``` DEBUG mode outputs screenshot coordinates and timing info. --- ## Project Structure ``` window_ocr/ ├── .python-version # Python version lock ├── pyproject.toml # uv project config & dependencies ├── README.md # This file └── src/ └── window_ocr/ ├── __init__.py # Package entry ├── __main__.py # `python -m window_ocr` entry ├── config.py # Window config, region config, OCR params ├── capture.py # Windows window screenshot wrapper ├── ocr_engine.py # PaddleOCR PP-OCRv5 wrapper └── main.py # Main loop & hotkey control ``` --- ## Hotkeys | Key | Function | |-----|----------| | `F9` | Pause / Resume OCR | | `F10` | Exit program | --- ## License MIT License