# air-drawing

**Repository Path**: aaron-gao/air-drawing

## Basic Information

- **Project Name**: air-drawing
- **Description**: Client-side air drawing tool
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-11-26
- **Last Updated**: 2021-11-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# air-drawing 👆

This tool uses Deep Learning to help you draw and write with your hand and webcam. A Deep Learning model is used to try to predict the user intent: whether you want to have stroke ('pencil down') or just move your hand ('pencil up'). Watch the gif until the end to see how it works.

**Try it online : [loicmagne.github.io/air-drawing](https://loicmagne.github.io/air-drawing/)**

![](assets/gif.gif)

## Technical Details

- This pipeline is made up of two steps: detecting the hand, and predicting the drawing. Both steps are done using Deep Learning.
- The handpose detection is performed using [MediaPipe toolbox](https://google.github.io/mediapipe/solutions/hands.html)
- The drawing prediction part uses only the finger position, not the image. The input is a sequence of 2D points (actually i'm using the speed and acceleration of the finger instead of the position to make the prediction translation-invariant), and the output is a binary classification 'pencil up' or 'pencil down'. I used a simple bidirectionnal LSTM architecture. I made a small dataset myself (~50 samples) which I annotated thanks to tools provided in the `python-stuff/data-wrangling/`. At first I wanted to make the 'pencil up'/'pencil down' prediction in real-time, i.e. make the predictions at the same time the user draws. However this task was too difficult and I had poor results, which is why I'm now using bidirectionnal LSTM. You can find details of the deep learning pipeline in the jupyter-notebook in `python-stuff/deep-learning/`
- The application is entirely client-side. I deployed the deep learning model by converting the PyTorch model to .onnx, and then using the [ONNX Runtime](https://github.com/microsoft/onnxruntime) which is very convenient and compatible with a lot of layers.

## Going Forward

Overall the pipeline still struggles and needs some improvement. Ideas of amelioration include :
- Having a bigger dataset, with more diverse user data.
- Process and smooth the finger signal, to be less dependent on camera quality, and to improve model generalization.