# ALF
**Repository Path**: mirrors_CESNET/ALF
## Basic Information
- **Project Name**: ALF
- **Description**: Active Learning Framework
- **Primary Language**: Unknown
- **License**: BSD-3-Clause
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-06-17
- **Last Updated**: 2026-05-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ALF - Active Learning Framework
## Outline
* [About project](#about-project)
* [Architecture](#architecture)
* [Use](#use)
* [Quick start](#quick-start)
* [How to create your own application](#how-to-create-your-own-application)
* [GUI Demo](#GUI)
* [Further Information](#further-information)
## About project
Recent network traffic classification methods benefit from machine learning (ML) technology. However, there are many challenges due to use of ML, such as: lack of high-quality annotated datasets, data-drifts and other effects causing aging of datasets and ML models, high volumes of network traffic etc. We presents a novel Active Learning Framework (ALF) to address this topic. ALF provides prepared software components that can be used to deploy an active learning loop and maintain an ALF instance that continuously evolves a dataset and ML model automatically. The resulting solution is deployable for **IP flow-based** analysis of high-speed (100\,Gb/s) networks, and also **supports research experiments** on different strategies and methods for annotation, evaluation, dataset optimization, etc.
---
## Architecture
ALF implements Active Learning Loop. Using activity diagram we visualize a design of ALF. It basically implements Active Learning loop so we can define ALF as AL core + input interface + preprocessing and postprocessing steps + evaluation.
Bellow we can see how ALF is implemented using class diagram. Note that we used simplified class diagram to simplify the implementation by ommiting inheritance.
## Use
Install all dependencies:
```
make init
```
There are 4 main dependencies:
* Python 3.10
* essential: `requirements.txt`
* developers: `requirements-dev.txt`
* NEMEA: `requirements-nemea.txt`
NEMEA dependencies are necessary for ALF to cooperate with [NEMEA framework](https://github.com/CESNET/Nemea). For now, we assume using NEMEA in tests and quick start. In the future we will remove this dependency.
## Quick start
* Tests, linting, documentation:
```
make test # unit testy
make lint # linter
firefox docs/_build/html/index.html # documentation
```
* Online stream demo:
Terminal 1:
```
mkdir workdir
python nemea_module_doh.py --i u:alf_socket --id test_random --workdir ./workdir --model single --query_strategy random --blacklist conf/blacklist.txt --query_nmax 1 --max_db_size 10000 --dpath conf/doh_D0.csv
```
Terminal 2:
```
/usr/bin/nemea/traffic_repeater -i "f:example.trapcap,u:alf_socket"
```
Parameter `i` defines NEMEA inferface. See [here](https://github.com/CESNET/Nemea-Modules/tree/c087d9a63f8feb0023e9f6400400450d2474f725/traffic_repeater) for more.
Note: When running `nemea_module_doh.py`, it is waiting for data to arrive on the socket and the program does not respond to the standard `SIGINT` (CTRL-C). You need to either kill the process (`SIGKILL`, `kill -9 $PID`) or send `SIGINT`, then send another stream (like the example) and the first thing it does after the loop continues is terminate (in `Python` `KeyboardInterrupt`). This is a feature of `Python` and its infinite waiting loop in the generator. We are aware of a solution, but since this property does no harm we decided not to address it for now.
## How to create your own application
For simplicity we do not use parameters and all constants are hardcoded.
```python
# logging
import logging
import sys
# use Random Forrest as classifier
from sklearn.ensemble import RandomForestClassifier
# import parts of ALF
import alf.anotator
import alf.context_manager
import alf.d_manager
import alf.engine
import alf.evaluator
import alf.input_manager
import alf.ml_model
import alf.postprocess
import alf.preprocess
import alf.query_strategy
```
Frameworks heavy uses `logging` module to log messages. Configure it:
```python
logging.basicConfig(
stream=sys.stdout,
format='[%(asctime)s]: %(message)s',
level=logging.DEBUG
)
```
Now let us to setting up contants and parameters. Usually this is set up by user or by configuration file etc:
```python
# list of features from flows, type: list[str]
DATASET_COLUMNS = ["f1", "f2", ..]
# interface IFC_SPEC defined by NEMEA
IFC = "u:alf_socket"
# id, workdir; id should be unique
EXP_ID = "showcase"
WORKDIR = "/tmp/alf"
# annotator specific:
BLACKLIST = "conf/blacklist.txt"
# D0 is init train dataset
D0 = "conf/doh_train_db_small.csv"
# maximum size of the D_i database
MAX_SIZE = 5000
# query strategy specific:
N = 10
THRESHOLD = 0.1
```
Now we create contexts:
```python
ContextProvider.create_context("file")
ContextProvider.get_context().set_features(DATASET_COLUMNS)
ContextProvider.get_context().set_experiment_id(EXP_ID)
ContextProvider.get_context().set_working_dir(WORKDIR)
DbProvider.create_context(context_type="file", d_0_path=D0)
```
Finally, now define ALF parts:
```python
anotator = alf.anotator.AnotatorDoH(blacklist_path=BLACKLIST)
model = alf.ml_model.SupervisedMLModel(RandomForestClassifier())
query_strategy = alf.query_strategy.UncertanityUnrankedBatch(
anotator_obj=anotator, max_samples=N,
score_threshold=THRESHOLD, dry_run=True)
input_manager = alf.input_manager.TrapcapSocketInputManager(
definition=IFC)
postprocessor = alf.postprocess.PostprocessorUndersample(MAX_SIZE)
```
We have to add parts to `Engine`:
```python
engine = alf.engine.Engine(
preprocessor=alf.preprocess.PreprocessorDoH(),
postprocessor=postprocessor,
ml_model_obj=model,
query_strategy_obj=query_strategy,
evaluator_obj=alf.evaluator.EvaluatorTestAnotatedAndAllPredicted(),
input_manager_obj=input_manager
)
```
Last part - run the machine:
```python
engine.run()
```
## GUI
ALF comes with an easy GUI demo built with `streamlit`.
Run with `streamlit run alf_gui.py`
## Further Information
* @jaroslavpesek here on Github
* pesek (at) cesnet.cz or pesekja8 (at) fit.cvut.cz