1 Star 0 Fork 0

烟雨江南/OneClassSVM

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
oneclass_svm_final.py 2.88 KB
一键复制 编辑 原始数据 按行查看 历史
mridul22 提交于 2017-11-19 19:35 . First commit
# coding: utf-8
import numpy as np
import pandas as pd
from sklearn import utils
import matplotlib
data = pd.read_csv('kddcup_data_10_percent.csv', low_memory=False)
data = data[data['service'] == "http"]
data = data[data["logged_in"] == 1]
relevant_features = [
"duration",
"src_bytes",
"dst_bytes",
"label"
]
data = data[relevant_features ]
data["duration"] = np.log((data["duration"] + 0.1).astype(float))
data["src_bytes"] = np.log((data["src_bytes"] + 0.1).astype(float))
data["dst_bytes"] = np.log((data["dst_bytes"] + 0.1).astype(float))
# we're using a one-class SVM, so we need.. a single class. the dataset 'label'
# column contains multiple different categories of attacks, so to make use of
# this data in a one-class system we need to convert the attacks into
# class 1 (normal) and class -1 (attack)
data.loc[data['label'] == "normal.", "attack"] = 1
data.loc[data['label'] != "normal.", "attack"] = -1
# grab out the attack value as the target for training and testing. since we're
# only selecting a single column from the `data` dataframe, we'll just get a
# series, not a new dataframe
target = data['attack']
# find the proportion of outliers we expect (aka where `attack == -1`). because
# target is a series, we just compare against itself rather than a column.
outliers = target[target == -1]
print("outliers.shape", outliers.shape)
print("outlier fraction", outliers.shape[0]/target.shape[0])
# drop label columns from the dataframe. we're doing this so we can do
# unsupervised training with unlabelled data. we've already copied the label
# out into the target series so we can compare against it later.
data.drop(["label", "attack"], axis=1, inplace=True)
# check the shape for sanity checking.
data.shape
from sklearn.model_selection import train_test_split
train_data, test_data, train_target, test_target = train_test_split(data, target, train_size = 0.8)
train_data.shape
from sklearn import svm
# set nu (which should be the proportion of outliers in our dataset)
nu = outliers.shape[0] / target.shape[0]
print("nu", nu)
model = svm.OneClassSVM(nu=nu, kernel='rbf', gamma=0.00005)
model.fit(train_data)
from sklearn import metrics
preds = model.predict(train_data)
targs = train_target
print("accuracy: ", metrics.accuracy_score(targs, preds))
print("precision: ", metrics.precision_score(targs, preds))
print("recall: ", metrics.recall_score(targs, preds))
print("f1: ", metrics.f1_score(targs, preds))
print("area under curve (auc): ", metrics.roc_auc_score(targs, preds))
preds = model.predict(test_data)
targs = test_target
print("accuracy: ", metrics.accuracy_score(targs, preds))
print("precision: ", metrics.precision_score(targs, preds))
print("recall: ", metrics.recall_score(targs, preds))
print("f1: ", metrics.f1_score(targs, preds))
print("area under curve (auc): ", metrics.roc_auc_score(targs, preds))
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/yanyu_jiangnan/OneClassSVM.git
git@gitee.com:yanyu_jiangnan/OneClassSVM.git
yanyu_jiangnan
OneClassSVM
OneClassSVM
master

搜索帮助

D67c1975 1850385 1daf7b77 1850385