# HunyuanOCR-WebUI **Repository Path**: lidll/HunyuanOCR-WebUI ## Basic Information - **Project Name**: HunyuanOCR-WebUI - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-10 - **Last Updated**: 2025-12-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # HunyuanOCR WebUI A web interface for testing the [Tencent HunyuanOCR](https://huggingface.co/tencent/HunyuanOCR) model. ![License](https://img.shields.io/badge/license-MIT%20with%20restrictions-blue.svg) ## Features - **Drag-and-drop** image upload with preview - **Real-time streaming** output with token-by-token display - **Pre-configured prompts** for common tasks - **Markdown rendering** for formatted output - **Dark/Light theme** toggle - **Copy & Download** output buttons - **Stop generation** mid-stream - **History** of recent queries (localStorage) - **Image preview modal** - click to enlarge - **Model status indicator** - shows when model is ready - **Health check endpoint** for monitoring - **Configurable** via optional YAML file - **Automatic image resizing** based on available GPU memory ## Examples ### Text Spotting ![Text Spotting](assets/text-spotting.png) ### Document Parsing ![Document Parsing](assets/document.png) ### Formula Recognition ![Formula Recognition](assets/formula.png) ![LaTeX Formula Render](assets/latex-formula.png) ### Table Parsing ![Table Parsing](assets/table.png) ### Chart Parsing ![Chart Parsing](assets/chart.png) ### Flowchart ![Flowchart](assets/chart-flowchart.png) ### Info Extraction ![Info Extraction](assets/info-extraction.png) ### Translation ![Translation](assets/translation.png) ### Handwritten Text ![Handwritten Text](assets/extract-handwritten-text.png) ### Text from Notes ![Text from Notes](assets/extract-text-from-notes.png) ### Corrupted Text ![Corrupted Text](assets/Extract-corrupted-text.png) ## Installation ```bash pip install -r requirements.txt ``` ## Usage ```bash python server.py ``` Then open `http://localhost:8000` in your browser. ## Configuration Optionally create a `config.yaml` to override defaults: ```yaml server: host: "0.0.0.0" port: 8000 model: name: "tencent/HunyuanOCR" max_new_tokens: 16384 dtype: "bfloat16" ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Main web interface | | `/health` | GET | Health check (model status, GPU memory) | | `/config` | GET | Current configuration | | `/ocr` | POST | Process image (streaming SSE) | ## Project Structure ``` HunyuanOCR-WebUI/ ├── server.py # FastAPI server ├── requirements.txt # Python dependencies ├── LICENSE # MIT with restrictions ├── static/ │ ├── css/ │ │ └── style.css # Stylesheets │ └── js/ │ └── app.js # Frontend JavaScript ├── templates/ │ └── index.html # HTML template └── assets/ # Example screenshots ``` ## Keyboard Shortcuts | Shortcut | Action | |----------|--------| | `Ctrl+Enter` | Run OCR | | `Escape` | Stop generation / Close modal | ## Requirements - Python 3.8+ - CUDA-compatible GPU (recommended) - 8GB+ VRAM for the model Check your GPU VRAM: ```bash nvidia-smi ``` > **Note:** The default `MAX_PIXELS_CAP` in `server.py` is 1M pixels, suitable for 8GB+ VRAM. If you have more VRAM, you can increase this value for higher resolution images (untested above 1M, results may vary). ## License MIT License with Additional Restrictions - see [LICENSE](LICENSE) for details. **Note:** The underlying HunyuanOCR model has its own [license](https://huggingface.co/tencent/HunyuanOCR/blob/main/LICENSE) with territorial and usage restrictions.