This repository contains a Demo application built using Deep Java Library(DJL). The application detects malicious urls based on a trained Character Level CNN model.
It uses a third-party dataset that is a amalgamation of the following malicious URL databases.
Benign
Malicious
The dataset can be found in the Malicious-URL-Detection-Deep-Learning repository.
The model has the following architecture.
(0): Conv1D(None -> 256, kernel_size=(7,), stride=(1,))
(1): MaxPool1D(size=(3,), stride=(3,), padding=(0,), ceil_mode=False)
(2): Conv1D(None -> 256, kernel_size=(7,), stride=(1,))
(3): MaxPool1D(size=(3,), stride=(3,), padding=(0,), ceil_mode=False)
(4): Conv1D(None -> 256, kernel_size=(3,), stride=(1,))
(5): Conv1D(None -> 256, kernel_size=(3,), stride=(1,))
(6): Conv1D(None -> 256, kernel_size=(3,), stride=(1,))
(7): Conv1D(None -> 256, kernel_size=(3,), stride=(1,))
(8): MaxPool1D(size=(3,), stride=(3,), padding=(0,), ceil_mode=False)
(9): Flatten
(10): Dense(None -> 1024, Activation(relu))
(11): Dropout(p = 0.5)
(12): Dense(None -> 1024, Activation(relu))
(13): Dropout(p = 0.5)
(14): Dense(None -> 2, linear)
There are 2 output classification labels, benign or malicious.
The general model structure is based on the Character level CNNs paper's model description.
This demo contains a simple proxy server that can work with browsers to detect malicious URLs. This demo acts as a filter for malicious URLs. It works with both HTTP and HTTPs requests.
Run the proxy server from your terminal.
$ ./gradlew run
This command starts the server, listening at port 8080.
> Task :run
[main] INFO com.example.FilterProxy - Waiting for request(s) on port 8080
In your browser settings (firefox, for example), set the proxy settings to 127.0.0.1:8080.
Now you can try to navigate to a mispelled amazon.com URL in the browser navigator and see the proxy at work.
The proxy server prints the following on the terminal screen.
> Task :run
[main] INFO com.example.FilterProxy - Waiting for client on port 8080..
[Thread-1] INFO com.example.RequestHandler - Malicious URL detected and blocked http://amazom.com/
Typing the correct URL should show the correct website.
To train the model, a GPU instance is recommended. Training with a CPU is very slow compared to training with a GPU.
Train the model.
$./gradle train
> Task :train
[main] INFO com.example.ModelTrainer - Loading Dataset
[main] INFO ai.djl.training.listener.LoggingTrainingListener - Running Malicious URL training on: cpu().
[main] INFO ai.djl.training.listener.LoggingTrainingListener - Load MXNet Engine Version 1.6.0 in 0.153 ms.
[main] INFO com.example.ModelTrainer - Begin Training
Training: 100% |████████████████████████████████████████| accuracy: 0.71 loss: 0.54 speed: 323.74 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 0 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.7131701, train loss: 0.5386256
[main] INFO com.example.ModelTrainer - validate accuracy: 0.89068836, validate loss: 0.30017522
Training: 100% |████████████████████████████████████████| accuracy: 0.92 loss: 0.21 speed: 336.71 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 1 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.91614103, train loss: 0.21319388
[main] INFO com.example.ModelTrainer - validate accuracy: 0.81641084, validate loss: 0.43587697
Training: 100% |████████████████████████████████████████| accuracy: 0.95 loss: 0.14 speed: 340.42 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 2 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.9457474, train loss: 0.14361516
[main] INFO com.example.ModelTrainer - validate accuracy: 0.89394796, validate loss: 0.26676258
Training: 100% |████████████████████████████████████████| accuracy: 0.96 loss: 0.11 speed: 339.22 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 3 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.9595303, train loss: 0.10922413
[main] INFO com.example.ModelTrainer - validate accuracy: 0.9005698, validate loss: 0.26299196
Training: 100% |████████████████████████████████████████| accuracy: 0.97 loss: 0.09 speed: 338.12 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 4 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.9665886, train loss: 0.08993893
[main] INFO com.example.ModelTrainer - validate accuracy: 0.85562855, validate loss: 0.38608122
Training: 100% |████████████████████████████████████████| accuracy: 0.97 loss: 0.07 speed: 292.06 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 5 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.97365975, train loss: 0.07207036
[main] INFO com.example.ModelTrainer - validate accuracy: 0.92241156, validate loss: 0.22621335
Training: 100% |████████████████████████████████████████| accuracy: 0.98 loss: 0.06 speed: 319.64 urls/sec
Validating: 100% |████████████████████████████████████████|
[main] INFO com.example.ModelTrainer - Epoch 6 finished.
[main] INFO com.example.ModelTrainer - train accuracy: 0.9785428, train loss: 0.05851153
[main] INFO com.example.ModelTrainer - validate accuracy: 0.9198193, validate loss: 0.2653344
To run inference again, copy the .params
file to the model
folder.
This repository contains documentation on training the model using DJL, writing Datasets, and using Translators.
This project is licensed under the Apache-2.0 License.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。