# kbd-audio
**Repository Path**: hongwing/kbd-audio
## Basic Information
- **Project Name**: kbd-audio
- **Description**: Tools for capturing and analysing keyboard input paired with microphone capture ⌨️
- **Primary Language**: C++
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 0
- **Created**: 2019-08-03
- **Last Updated**: 2024-07-24
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# kbd-audio [](https://travis-ci.org/ggerganov/kbd-audio?branch=master)
## Description
This is a collection of command-line and GUI tools for capturing and analyzing audio data.
### Keytap
The most interesting tool is called **keytap** - it can guess pressed keyboard keys only by analyzing the audio captured from the computer's microphone.
Check this blog post for more details:
[Keytap: description and some random thoughts](https://ggerganov.github.io/jekyll/update/2018/11/30/keytap-description-and-thoughts.html)
[Video: short demo of Keytap in action](https://www.youtube.com/watch?v=2OjzI9m7W10)
### Keytap2
The **keytap2** tool is another interesting tool for recovering text from audio. It does not require training data - instead it uses statistical information about the frequencies of the letters and n-grams in the English language. The tool is still in development, but you can see a short demonstration here:
[Video: Keytap2 - recovering text from typing sound (7:50)](https://www.youtube.com/watch?v=Y8nWkdWl7Pg)
[CTF: can you guess the text being typed?](https://ggerganov.github.io/keytap-challenge/)
## Build instructions
Dependencies:
- **SDL2** - used to capture audio and to open GUI windows [libsdl](https://www.libsdl.org)
[Ubuntu]
$ sudo apt install libsdl2-dev
[Mac OS with brew]
$ brew install sdl2
- **FFTW3** *(optional)* - some of the helper tools perform Fourier transformations [fftw](http://www.fftw.org)
**Linux and Mac OS**
git clone https://github.com/ggerganov/kbd-audio
cd kbd-audio
git submodule update --init
mkdir build && cd build
cmake ..
make
**Windows**
(todo, PRs welcome)
## Tools
Short summary of the available tools. If the status of the tool is not **stable**, expect problems and non-optimal results.
| Name | Type | Status |
| --- | --- | --- |
| **record** | text | **stable** |
| **record-full** | text | **stable** |
| **play** | text | **stable** |
| **play-full** | text | **stable** |
| **view-gui** | gui | **stable** |
| **view-full-gui** | gui | **stable** |
| **keytap** | text | **stable** |
| **keytap-gui** | gui | **stable** |
| **keytap2** | text | development |
| **keytap2-gui** | gui | development |
| - | *extra* | - |
| **guess_qp** | text | experiment |
| **guess_qp2** | text | experiment |
| **key_detector** | text | experiment |
| **scale** | text | experiment |
| **subreak** | text | experiment |
| **key_average_gui** | gui | experiment |
## Tool details
* **record-full**
Record audio to a raw binary file on disk
./record-full output.kbd [-cN]
---
* **play-full**
Playback a recording captured via the **record-full** tool
./play-full input.kbd [-pN]
---
* **record**
Record audio only while typing. Useful for collecting training data for **keytap**
./record output.kbd [-cN]
---
* **play**
Playback a recording created via the **record** tool
./play input.kbd [-pN]
---
* **keytap**
Detect pressed keys via microphone audio capture in real-time. Uses training data captured via the **record** tool.
./keytap input0.kbd [input1.kbd] [input2.kbd] ... [-cN] [-pF] [-tF]
---
* **keytap-gui**
Detect pressed keys via microphone audio capture in real-time. Uses training data captured via the **record** tool. GUI version.
./keytap-gui input0.kbd [input1.kbd] [input2.kbd] ... [-cN]
[**Live demo *(WebAssembly threads required)* **](https://ggerganov.github.io/jekyll/update/2018/11/24/keytap.html)

---
* **keytap2-gui** *(work in progress)*
Detect pressed keys via microphone audio capture. Uses statistical information (n-gram frequencies) about the language. **No training data is required**. The *'recording.kbd'* input file has to be generated via the **record-full** tool and contains the audio data that will be analyzed. The *'n-gram.txt'* file has to contain n-gram probabilities for the corresponding language.
./keytap2-gui recording.kbd n-gram.txt

---
* **view-full-gui**
Visualize waveforms recorded with the **record-full** tool. Can also playback the audio data.
./view-full-gui input.kbd

---
* **view-gui**
Visualize training data recorded with the **record** tool. Can also playback the audio data.
./view-gui input.kbd

---
## Feedback
Any feedback about the performance of the tools is highly appreciated. Please drop a comment [here](https://github.com/ggerganov/kbd-audio/issues/3).