# ocap **Repository Path**: monkeycc/ocap ## Basic Information - **Project Name**: ocap - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-03 - **Last Updated**: 2025-12-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ο»Ώ# ocap [![ocap](https://img.shields.io/pypi/v/ocap?label=ocap)](https://pypi.org/project/ocap/) [![gstreamer-bundle](https://img.shields.io/conda/vn/open-world-agents/gstreamer-bundle?label=gstreamer-bundle)](https://anaconda.org/open-world-agents/gstreamer-bundle) High-performance desktop recorder for Windows. Captures screen, audio, keyboard, mouse, and window events. This project was first introduced and developed for the D2E project. For more details, see [D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI](https://worv-ai.github.io/d2e/) If you find this work useful, please cite our paper. ## What is ocap? **ocap** (Omnimodal CAPture) captures all essential desktop signals in synchronized format. Records screen video, audio, keyboard/mouse input, and window events. Built for the _open-world-agents_ project but works for any desktop recording needs. > **TL;DR**: Complete, high-performance desktop recording tool for Windows. Captures everything in one command. https://github.com/user-attachments/assets/4e94782c-02ae-4f64-bb52-b08be69d33da πŸ“Š **Working with recorded data?** See the [OWAMcap Format Guide](https://open-world-agents.github.io/open-world-agents/data/technical-reference/format-guide) for analysis, processing, and ML integration. ## Key Features | **Feature** | **ocap** | [OBS](https://obsproject.com/) | [wcap](https://github.com/mmozeiko/wcap) | [pillow](https://github.com/python-pillow/Pillow)/[mss](https://github.com/BoboTiG/python-mss) | |------------------------------------------|--------------------------|--------------------------------|------------------------------------------|----------------------------------| | Advanced data formats (OWAMcap) | βœ… Yes | ❌ No | ❌ No | ❌ No | | Timestamp aligned logging | βœ… Yes | ❌ No | ❌ No | ❌ No | | Customizable event definition & Listener | βœ… Yes | ❌ No | ❌ No | ❌ No | | Single python file | βœ… Yes | ❌ No | ❌ No | ❌ No | | Audio + Window + Keyboard + Mouse | βœ… Yes | ⚠️ Partial | ❌ No | ❌ No | | Hardware-accelerated encoder | βœ… Yes | βœ… Yes | βœ… Yes | ❌ No | | Supports latest Windows APIs | βœ… Yes | βœ… Yes | βœ… Yes | ❌ No (legacy APIs only) | | Optional mouse cursor capture | βœ… Yes | βœ… Yes | βœ… Yes | ❌ No | - **Complete desktop recording**: Video, audio, keyboard/mouse events, window events - **High performance**: Hardware-accelerated with Windows APIs and [GStreamer](https://gstreamer.freedesktop.org/) - **Efficient encoding**: [H265/HEVC](https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding) for high quality and small file size - **Simple operation**: `ocap FILE_LOCATION` (stop with Ctrl+C) - **Clean architecture**: Core logic in a single 400-line [recorder.py](https://github.com/open-world-agents/ocap/blob/main/owa/ocap/recorder.py) - **Modern formats**: MKV with embedded timestamps, [**OWAMcap format**](https://open-world-agents.github.io/open-world-agents/data/technical-reference/format-guide) for events (built on [MCAP](https://mcap.dev/)) ## System Requirements Based on OBS Studio recommended specs + NVIDIA GPU requirements: | Component | Specification | |-----------|---------------| | **OS** | Windows 11 (64-bit) | | **Processor** | Intel i7 8700K / AMD Ryzen 1600X | | **Memory** | 8 GB RAM | | **Graphics** | **NVIDIA GeForce 10 Series or newer** ⚠️ | | **DirectX** | Version 11 | | **Storage** | 600 MB + ~100MB per minute recording | > ⚠️ **NVIDIA GPU Required**: Currently only supports NVIDIA GPUs for hardware acceleration. AMD/Intel GPU support possible through GStreamer framework - **contributions welcome**! > πŸ–₯️ **OS Support**: Currently only supports Windows. However, support for other operating systems (Linux, macOS) can be relatively easily extended due to the presence of GStreamer. Simply using different GStreamer pipelines can enable capture on other platforms - **contributions welcome**! ## Installation & Usage ### Option 1: Download Release 1. Download `ocap.zip` from [releases](https://github.com/open-world-agents/ocap/releases) 2. Unzip and run: - Double-click `run.bat` (opens terminal with virtual environment) - Or in CLI: `run.bat --help` ### Option 2: Package Install All OWA packages are available on PyPI: ```sh # Install GStreamer dependencies first (for video recording) $ conda install open-world-agents::gstreamer-bundle # Install ocap $ pip install ocap ``` ### Basic Usage ```sh # Start recording (stop with Ctrl+C) $ ocap my-recording # Show all options $ ocap --help # Advanced options $ ocap FILENAME --window-name "App" # Record specific window $ ocap FILENAME --monitor-idx 1 # Record specific monitor $ ocap FILENAME --fps 60 # Set framerate $ ocap FILENAME --no-record-audio # Disable audio ``` ### Output Files - `.mcap` β€” Event log (keyboard, mouse, windows) in OWAMcap format - `.mkv` β€” Video/audio with embedded timestamps Your recording files will be ready immediately! ## Technical Architecture Built on GStreamer with clean, maintainable design: ```mermaid flowchart TD %% Input Sources A[owa.env.desktop] --> B[Keyboard Events] A --> C[Mouse Events] A --> D[Window Events] E[owa.env.gst] --> F[Screen Capture] E --> G[Audio Capture] %% Core Processing B --> H[Event Queue] C --> H D --> H F --> H F --> I[Video/Audio Pipeline] G --> I %% Outputs H --> J[MCAP Writer] I --> K[MKV Pipeline] %% Files J --> L[πŸ“„ events.mcap] K --> M[πŸŽ₯ video.mkv] style A fill:#e1f5fe style E fill:#e1f5fe style H fill:#fff3e0 style L fill:#e8f5e8 style M fill:#e8f5e8 ``` - **Easy to verify**: Extensive [OWA's Env](../env/index.md) design enables customizable [`recorder.py`](https://github.com/open-world-agents/ocap/blob/main/owa/ocap/recorder.py) - **Native performance**: Direct Windows API integration ([DXGI](https://learn.microsoft.com/en-us/windows/win32/direct3ddxgi/d3d10-graphics-programming-guide-dxgi)/[WGC](https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture?view=winrt-26100), [WASAPI](https://learn.microsoft.com/en-us/windows/win32/coreaudio/wasapi)) ## Troubleshooting - **Record terminates right after start?** Re-run the same command a few times. This is due to an intermittent GStreamer crash with an unknown cause. - **GStreamer error message box appears on first run?** This is a known issue where GStreamer may show error dialogs the first time you run `ocap`. These messages do not affect recordingβ€”simply close the dialogs and continue. `ocap` will function normally. - **Audio not recording?** By default, only audio from the target process is recorded. To change this, manually edit the [GStreamer pipeline](https://github.com/open-world-agents/open-world-agents/blob/main/projects/owa-env-gst/owa/env/gst/pipeline_builder/factory.py#L80). - **Large file sizes?** Reduce file size by adjusting the [`gop-size`](https://gstreamer.freedesktop.org/documentation/nvcodec/nvd3d11h265enc.html?gi-language=c#nvd3d11h265enc:gop-size) parameter in the `nvd3d11h265enc` element. See [pipeline.py](https://github.com/open-world-agents/open-world-agents/blob/main/projects/owa-env-gst/owa/env/gst/pipeline_builder/pipeline.py). - **Performance tips:** Close unnecessary applications before recording, use SSD storage for better write performance, and record to a different drive than your OS drive. ## FAQ - **How much disk space do recordings use?** ~100MB per minute for 1080p H265 recording. - **Will ocap slow down my computer?** Minimal impact with hardware acceleration. Designed for low overhead. - **What is OWAMcap format?** A specialized format that stores screen video (.mkv) + synchronized events (.mcap) for AI training. Contains keyboard, mouse, window events with nanosecond precision. [Learn more β†’](https://open-world-agents.github.io/open-world-agents/data/technical-reference/format-guide) - **Can I save recording in other formats?** Yes sure, all the source code you must edit is single [recorder.py](https://github.com/open-world-agents/ocap/blob/main/owa/ocap/recorder.py). You can implement JSONL, Parquet, CSV, anything you want easily. ## When to Use ocap - **AI Agent Training**: Capture desktop interactions for training multimodal models - **Workflow Documentation**: Record exact steps with precise timing - **Performance Testing**: Low-overhead recording during intensive tasks - **Research & Datasets**: Generate standardized OWAMcap data for the community ([HuggingFace Hub](https://huggingface.co/datasets?other=OWA)) ## Citation If you find this work useful, please cite our paper: ``` @article{choi2025d2e, title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI}, author={Choi, Suwhan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung}, journal={arXiv preprint arXiv:2510.05684}, year={2025} } ```