# ellzaf_ml **Repository Path**: linhaigoo/ellzaf_ml ## Basic Information - **Project Name**: ellzaf_ml - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-25 - **Last Updated**: 2024-07-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
PatchSwap applied to different person:
First (Image A) and third (Image B) picture from the left are the original image.
The second picture from the left is when the eyes, nose, and mouth from Image B is applied to Image A.
The fourth picture from the left is when the eyes, nose, and mouth from Image A is applied to Image B.
```python
from ellzaf_ml.augments import PatchSwap
swapper = PatchSwap()
image_a, image_b = swapper.swap_features('path/to/face_imageA.jpg', 'path/to/face_imageB.jpg')
# you can specify facial features that you want to swap, default value is ["right_eye", "left_eye", "nose", "lips"]
image_c, image_d = swapper.swap_features('path/to/face_imageC.jpg',
'path/to/face_imageD.jpg',
features_to_swap=["left_eye", "nose",])
# optional to see the images
if image_a is not None and image_b is not None:
swapper.show_image(image_a, 'Image A with features from B', image_b, 'Image B with features from A')
# go through images in folder
input_dir = 'path/to/real_face_folder'
output_dir = 'path/to/fake_face_folder'
# Call the class method with the input and output directories
swapper.swap_features_in_directory(input_dir, output_dir)
```
Key differences:
1. Instead of using dlib, I use MediaPipe for face landmark
2. I only swap eyes instead of eyes and eye brows
If you want to follow the paper method, use input folder consisting of the same person for `swap_features_in_directory`.
## ⚡ Models
### 🌟GhostFaceNets
PyTorch version of [GhostFaceNetsV1](https://github.com/HamadYA/GhostFaceNets/tree/main).
GhostNet code from [Huawei Noah's Ark Lab](https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master).
```python
import torch
from ellzaf_ml.models import GhostFaceNetsV1
IMAGE_SIZE = 112
#return embedding
model = GhostFaceNetsV1(image_size=IMAGE_SIZE, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
#return classification
model = GhostFaceNetsV1(image_size=IMAGE_SIZE, num_classes=3, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
```
PyTorch version of [GhostFaceNetsV2](https://github.com/HamadYA/GhostFaceNets/tree/main).
GhostNetV2 code from [Huawei Noah's Ark Lab](https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master).
```python
import torch
from ellzaf_ml.models import GhostFaceNetsV2
IMAGE_SIZE = 112
#return embedding
model = GhostFaceNetsV2(image_size=IMAGE_SIZE, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
#return classification
model = GhostFaceNetsV2(image_size=IMAGE_SIZE, num_classes=3, width=1, dropout=0.)
img = torch.randn(3, 3, IMAGE_SIZE, IMAGE_SIZE)
model(img)
```
In order to not use GAP like mentioned in the paper, you need to specify the image size.
You also need to have image_size>=33.
#### TODO
- [x] Replicate model.
- [ ] Create training code.
### 🌟SpectFormer
Implementation of [SpectFormer](https://arxiv.org/abs/2304.06446) vanilla architecture.
Code is modified version of ViT from [Vit-PyTorch](https://github.com/lucidrains/vit-pytorch/tree/main).
```python
import torch
from ellzaf_ml.models import SpectFormer
model = SpectFormer(
image_size = 224,
patch_size = 16,
num_classes = 1000,
dim = 512,
depth = 12,
heads = 16,
mlp_dim = 1024,
spect_alpha = 4, # amount of spectral block (depth - spect_alpha = attention block)
)
img = torch.randn(1, 3, 224, 224)
preds = model(img) # prediction -> (1,1000)
```
SpectFormer utilizes both spectral block and attention block. The amount of spectral block can be speciified using `spect_alpha` and the remaining block from `depth` will be attention blocks.
depth - spect_alpha = attention block
12 - 4 = 8
From the code and calculation example above, when `spect_alpha` are 4 with the `depth` of 12. The resulting attention block will be 8. If `spect_alpha` == `depth`, it will be GFNet while if spect_alpa = 0, it will be ViT.
Implementation of [LBP and CNN Feature Fusion for face anti-spoofing](https://link.springer.com/article/10.1007/s10044-023-01132-4)
This model is primarily used for face liveness.
```python
import torch
from ellzaf_ml.models import LBPCNNFeatureFusion
model = LBPCNNFeatureFusion(num_classes=2)
img = torch.rand(1, 3, 224, 224)
preds = model(img) # prediction -> (1,2)
```
Implementation of [A novel Deep CNN based LDnet model with the combination of 2D and 3D CNN for Face Liveness Detection.](https://ieeexplore.ieee.org/document/9914362)
This model primary use is for face liveness.
```python
import torch
from ellzaf_ml.models import LDnet
model = LDnet(image_size=64)
img = torch.rand(1, 3, 64, 64)
preds = model(img) # prediction -> (1,2)
```
### 🌟SimMIM
Modified SimMIM code from the original [repo](https://github.com/microsoft/SimMIM/tree/main) but the archtecture is still the same.
```python
from ellzaf_ml.models import ViTSpectralForSimMIM, SimMIM
encoder = ViTSpectralForSimMIM(
img_size=224,
patch_size=16,
in_chans=3,
num_classes=0,
embed_dim=384,
depth=12,
num_heads=6,
mlp_ratio=4,
qkv_bias=True,
drop_rate=0.,
drop_path_rate=0.1,
norm_layer=partial(nn.LayerNorm, eps=1e-6),
init_values=0.1,
use_abs_pos_emb=False,
use_rel_pos_bias=False,
use_shared_rel_pos_bias=True,
use_mean_pooling=False)
encoder_stride = 16
simmim = SimMIM(encoder=encoder, encoder_stride=encoder_stride)
```
### 🌟MixMobileNet
Implementation of [MixMobileNet](https://github.com/microsoft/SimMIM/tree/main). There are three variants: XXS, XS and S.
```python
import torch
from ellzaf_ml.models import MixMobileNet
img = torch.randn(1, 3, 224, 224)
model = MixMobileNet(variant="S", img_size=224, num_classes=2)
model(img)
```
Due to the paper does not implicitly mention their approach when image height and width is non-power-of-two.
I decided to pad the output from downsampler with 0 at right and bottom so that 7x7 will become 8x8.
## 🛠️ Tools
### 🔨 EarlyStopping
Ease early stopping process during model training. This code is from [here](https://github.com/Bjarten/early-stopping-pytorch) and I modified some of it based on the issues from that GitHub.
```python
import torch
from torch import nn, optim
from ellzaf_ml.tools import EarlyStopping
# Assuming 'model' has been defined and modified for a 2-class output
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# initialize the early_stopping object
early_stopping = EarlyStopping(patience=10, verbose=True)
num_epochs = 100 # Define the number of epochs you want to train for
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
correct_predictions = 0
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predictions = torch.max(outputs, 1)
correct_predictions += (predictions == labels).sum().item()
epoch_loss = running_loss / len(train_loader.dataset)
epoch_acc = correct_predictions / len(train_loader.dataset)
model.eval()
val_running_loss = 0.0
val_correct_predictions = 0
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
val_running_loss += loss.item()
_, predictions = torch.max(outputs, 1)
val_correct_predictions += (predictions == labels).sum().item()
val_epoch_loss = val_running_loss / len(val_loader.dataset)
val_epoch_acc = val_correct_predictions / len(val_loader.dataset)
# early_stopping needs the validation loss to check if it has decresed,
# and if it has, it will make a checkpoint of the current model
early_stopping(valid_loss, model)
if early_stopping.early_stop:
print("Early stopping")
break
print(f'Epoch {epoch+1}/{num_epochs}')
print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
print(f'Val Loss: {val_epoch_loss:.4f} Acc: {val_epoch_acc:.4f})
```