# windows-ocr

**Repository Path**: geminga/windows-ocr

## Basic Information

- **Project Name**: windows-ocr
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-06-07
- **Last Updated**: 2026-06-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Window OCR - Real-time window text recognition

A Windows real-time window OCR tool based on **PaddleOCR PP-OCRv5**, capable of capturing the **left region** of a specified window and performing text recognition.

---

## Features

- Real-time capture of specified window (supports background/minimized windows)
- Configurable relative region (default: left 40% of window)
- Uses PaddleOCR PP-OCRv5 model, supports Chinese/English/Japanese/Traditional/Pinyin
- Hotkey pause/resume (F9) and exit (F10)
- Results printed to console and saved to log file

---

## Requirements

| Item | Requirement |
|------|-------------|
| OS | Windows 10 / 11 |
| Python | 3.10 ~ 3.12 |
| RAM | 8GB+ recommended |
| GPU | CPU works, NVIDIA GPU accelerates |

---

## Quick Start

### 1. Install uv (if not installed)

```powershell
# PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or update uv
uv self update
```

### 2. Enter project directory

```powershell
cd window_ocr
```

### 3. Create virtual environment and install dependencies

```powershell
# Use project's specified Python version
uv sync

# Or force reinstall (if dependencies have issues)
uv sync --reinstall
```

> First-time installation of `paddlepaddle` will auto-download ~100MB wheel. PaddleOCR models auto-download on first run (~30MB).

### 4. Configure target window

Edit `src/window_ocr/config.py`, modify `TARGET_WINDOW`:

```python
TARGET_WINDOW = {
    "title": "Your Window Title",  # Window title keyword (fuzzy match)
    "class_name": None,          # Window class name (optional, exact match)
}
```

**How to find window title:**

```powershell
# After activating venv, run helper script
uv run python -c "import src.window_ocr.capture as cap; cap.list_windows()"
```

### 5. Start OCR

```powershell
# Method 1: Direct run (recommended)
uv run python -m window_ocr

# Method 2: Use src path
uv run python src/window_ocr/main.py
```

After startup, the program will:
1. Find the target window
2. Calculate left region (default 40% window width)
3. Screenshot every 500ms and recognize text
4. Print results to console

---

## Debug Guide

### List all windows

```powershell
uv run python -c "
from window_ocr.capture import list_windows
list_windows()
"
```

### Test if screenshot region is correct

```powershell
# Generate debug screenshot to verify captured region
uv run python -c "
from window_ocr.capture import WindowCapture
from window_ocr.config import TARGET_WINDOW, REGION_CONFIG
import time

cap = WindowCapture(TARGET_WINDOW['title'])
time.sleep(1)
img = cap.capture_region(
    REGION_CONFIG['left_rel'],
    REGION_CONFIG['top_rel'],
    REGION_CONFIG['width_rel'],
    REGION_CONFIG['height_rel']
)
img.save('debug_capture.png')
print('Screenshot saved to debug_capture.png')
"
```

### Test OCR engine standalone

```powershell
uv run python -c "
from window_ocr.ocr_engine import OCREngine
from PIL import Image

engine = OCREngine()
result = engine.recognize(Image.open('debug_capture.png'))
print(result)
"
```

### Adjust recognition region

Edit `REGION_CONFIG` in `src/window_ocr/config.py`:

```python
REGION_CONFIG = {
    "left_rel": 0.0,      # Relative offset from left (0.0 = leftmost)
    "top_rel": 0.0,       # Relative offset from top (0.0 = topmost)
    "width_rel": 0.4,     # Relative width ratio (0.4 = 40%)
    "height_rel": 1.0,    # Relative height ratio (1.0 = 100%)
}
```

All values are **0.0 ~ 1.0** relative ratios, automatically adapting to window size changes.

### Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `Window not found: xxx` | Title mismatch | Use `list_windows()` to check actual title |
| `OCR result empty` | Wrong region / small text | Adjust `REGION_CONFIG` or zoom window |
| `paddle import failed` | Incomplete install | `uv sync --reinstall` |
| `Model download slow` | Network issue | Set env `PADDLE_HOME` to local model dir |
| `Black screenshot` | Window minimized / DPI scaling | Keep window visible, check system scaling |

### Log level adjustment

In `config.py`:

```python
LOG_LEVEL = "DEBUG"   # DEBUG / INFO / WARNING
```

DEBUG mode outputs screenshot coordinates and timing info.

---

## Project Structure

```
window_ocr/
├── .python-version          # Python version lock
├── pyproject.toml           # uv project config & dependencies
├── README.md                # This file
└── src/
    └── window_ocr/
        ├── __init__.py      # Package entry
        ├── __main__.py      # `python -m window_ocr` entry
        ├── config.py        # Window config, region config, OCR params
        ├── capture.py       # Windows window screenshot wrapper
        ├── ocr_engine.py    # PaddleOCR PP-OCRv5 wrapper
        └── main.py          # Main loop & hotkey control
```

---

## Hotkeys

| Key | Function |
|-----|----------|
| `F9` | Pause / Resume OCR |
| `F10` | Exit program |

---

## License

MIT License