# HunyuanOCR-WebUI

**Repository Path**: lidll/HunyuanOCR-WebUI

## Basic Information

- **Project Name**: HunyuanOCR-WebUI
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-10
- **Last Updated**: 2025-12-10

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# HunyuanOCR WebUI

A web interface for testing the [Tencent HunyuanOCR](https://huggingface.co/tencent/HunyuanOCR) model.

![License](https://img.shields.io/badge/license-MIT%20with%20restrictions-blue.svg)

## Features

- **Drag-and-drop** image upload with preview
- **Real-time streaming** output with token-by-token display
- **Pre-configured prompts** for common tasks
- **Markdown rendering** for formatted output
- **Dark/Light theme** toggle
- **Copy & Download** output buttons
- **Stop generation** mid-stream
- **History** of recent queries (localStorage)
- **Image preview modal** - click to enlarge
- **Model status indicator** - shows when model is ready
- **Health check endpoint** for monitoring
- **Configurable** via optional YAML file
- **Automatic image resizing** based on available GPU memory

## Examples

### Text Spotting
![Text Spotting](assets/text-spotting.png)

### Document Parsing
![Document Parsing](assets/document.png)

### Formula Recognition
![Formula Recognition](assets/formula.png)

![LaTeX Formula Render](assets/latex-formula.png)

### Table Parsing
![Table Parsing](assets/table.png)

### Chart Parsing
![Chart Parsing](assets/chart.png)

### Flowchart
![Flowchart](assets/chart-flowchart.png)

### Info Extraction
![Info Extraction](assets/info-extraction.png)

### Translation
![Translation](assets/translation.png)

### Handwritten Text
![Handwritten Text](assets/extract-handwritten-text.png)

### Text from Notes
![Text from Notes](assets/extract-text-from-notes.png)

### Corrupted Text
![Corrupted Text](assets/Extract-corrupted-text.png)

## Installation

```bash
pip install -r requirements.txt
```

## Usage

```bash
python server.py
```

Then open `http://localhost:8000` in your browser.

## Configuration

Optionally create a `config.yaml` to override defaults:

```yaml
server:
  host: "0.0.0.0"
  port: 8000

model:
  name: "tencent/HunyuanOCR"
  max_new_tokens: 16384
  dtype: "bfloat16"
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Main web interface |
| `/health` | GET | Health check (model status, GPU memory) |
| `/config` | GET | Current configuration |
| `/ocr` | POST | Process image (streaming SSE) |

## Project Structure

```
HunyuanOCR-WebUI/
├── server.py              # FastAPI server
├── requirements.txt       # Python dependencies
├── LICENSE                # MIT with restrictions
├── static/
│   ├── css/
│   │   └── style.css      # Stylesheets
│   └── js/
│       └── app.js         # Frontend JavaScript
├── templates/
│   └── index.html         # HTML template
└── assets/                # Example screenshots
```

## Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Ctrl+Enter` | Run OCR |
| `Escape` | Stop generation / Close modal |

## Requirements

- Python 3.8+
- CUDA-compatible GPU (recommended)
- 8GB+ VRAM for the model

Check your GPU VRAM:
```bash
nvidia-smi
```

> **Note:** The default `MAX_PIXELS_CAP` in `server.py` is 1M pixels, suitable for 8GB+ VRAM. If you have more VRAM, you can increase this value for higher resolution images (untested above 1M, results may vary).

## License

MIT License with Additional Restrictions - see [LICENSE](LICENSE) for details.

**Note:** The underlying HunyuanOCR model has its own [license](https://huggingface.co/tencent/HunyuanOCR/blob/main/LICENSE) with territorial and usage restrictions.