1 Star 0 Fork 0

yasiping / emnlp2017-bilstm-cnn-crf

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
RunModel.py 1.46 KB
一键复制 编辑 原始数据 按行查看 历史
nreimers 提交于 2018-04-20 12:41 . Update readme and docker
#!/usr/bin/python
# This scripts loads a pretrained model and a raw .txt files. It then performs sentence splitting and tokenization and passes
# the input sentences to the model for tagging. Prints the tokens and the tags in a CoNLL format to stdout
# Usage: python RunModel.py modelPath inputPath
# For pretrained models see docs/Pretrained_Models.md
from __future__ import print_function
import nltk
from util.preprocessing import addCharInformation, createMatrices, addCasingInformation
from neuralnets.BiLSTM import BiLSTM
import sys
if len(sys.argv) < 3:
print("Usage: python RunModel.py modelPath inputPath")
exit()
modelPath = sys.argv[1]
inputPath = sys.argv[2]
# :: Read input ::
with open(inputPath, 'r') as f:
text = f.read()
# :: Load the model ::
lstmModel = BiLSTM.loadModel(modelPath)
# :: Prepare the input ::
sentences = [{'tokens': nltk.word_tokenize(sent)} for sent in nltk.sent_tokenize(text)]
addCharInformation(sentences)
addCasingInformation(sentences)
dataMatrix = createMatrices(sentences, lstmModel.mappings, True)
# :: Tag the input ::
tags = lstmModel.tagSentences(dataMatrix)
# :: Output to stdout ::
for sentenceIdx in range(len(sentences)):
tokens = sentences[sentenceIdx]['tokens']
for tokenIdx in range(len(tokens)):
tokenTags = []
for modelName in sorted(tags.keys()):
tokenTags.append(tags[modelName][sentenceIdx][tokenIdx])
print("%s\t%s" % (tokens[tokenIdx], "\t".join(tokenTags)))
print("")
1
https://gitee.com/yasiping/emnlp2017-bilstm-cnn-crf.git
git@gitee.com:yasiping/emnlp2017-bilstm-cnn-crf.git
yasiping
emnlp2017-bilstm-cnn-crf
emnlp2017-bilstm-cnn-crf
master

搜索帮助