代码拉取完成,页面将自动刷新
dcnn-nlp是一款使用卷积神经网络进行自然语言处理以及文本分类的工具。参考2014ACL论文"A Convolutional Neural Network for Modelling Sentences"实现并扩展。
它具有以下特征:
# Stanford Sentiment Treebank Experiment
# You should run python prepare.py in the data/stanford direction firstly
total_data_file = 'data/stanford/total.data'
total_sentences = LineSentence(total_data_file, repeat=5)
train_data_file = 'data/stanford/train2.data'
train_label_file = 'data/stanford/train2.label'
train_sentences = LineSentence(train_data_file)
train_labels = numpy.fromfile(train_label_file, sep='\n', dtype=numpy.int32)
dev_data_file = 'data/stanford/dev2.data'
dev_label_file = 'data/stanford/dev2.label'
dev_sentences = LineSentence(dev_data_file)
dev_labels = numpy.fromfile(dev_label_file, sep='\n', dtype=numpy.int32)
test_data_file = 'data/stanford/test2.data'
test_label_file = 'data/stanford/test2.label'
test_sentences = LineSentence(test_data_file)
test_labels = numpy.fromfile(test_label_file, sep='\n', dtype=numpy.int32)
# n_filters=[6,14] in the paper
# n_filters=[4,6] in LeNet
# But you can go deeper
model = DCNNDeep(sentences=train_sentences, output_layer_size=2, wordvec_dim=48,
alpha=0.012, entropy_descent_m=0.995, dropout_rate_in_hiddens=0.5,
dropout_rate_in_input=0.2, min_count=2, full_con_layer_size=5,
filter_width=[7,5,3], k_top=4, n_filters=[6,14,6], alpha_m=0.999995,
min_alpha=0.00001, pre_train_word_vec=True, pre_train_sentences=total_sentences)
model.train(train_sentences=train_sentences, train_labels=train_labels, patience=5,
validate_freq=2000, max_entropy_allowed=0.38, validate_sentences=dev_sentences,
validate_labels=dev_labels, chunksize=5)
print 'test accuracy: %f' %model.accuracy(test_sentences, test_labels)
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
1. 开源生态
2. 协作、人、软件
3. 评估模型