代码拉取完成,页面将自动刷新
A Chinese word processing toolkit
*** 2020/2/16 *** update: use bert model train and export model to deploy, chinese train documentation
To download and build FoolNLTK, type:
get clone https://github.com/rockyzhengwu/FoolNLTK.git
cd FoolNLTK/train
For detailed instructions
pip install foolnltk
import fool
text = "一个傻子在北京"
print(fool.cut(text))
# ['一个', '傻子', '在', '北京']
For participle segmentations, specify a -b
parameter to increase the number of lines segmented every run.
python -m fool [filename]
The format of the dictionary is as follows: the higher the weight of a word, and the longer the word length is, the more likely the word is to appear. Word weight value should be greater than 1。
难受香菇 10
什么鬼 10
分词工具 10
北京 10
北京天安门 10
To load the dictionary:
import fool
fool.load_userdict(path)
text = ["我在北京天安门看你难受香菇", "我在北京晒太阳你在非洲看雪"]
print(fool.cut(text))
#[['我', '在', '北京', '天安门', '看', '你', '难受', '香菇'],
# ['我', '在', '北京', '晒太阳', '你', '在', '非洲', '看', '雪']]
To delete the dictionary
fool.delete_userdict();
import fool
text = ["一个傻子在北京"]
print(fool.pos_cut(text))
#[[('一个', 'm'), ('傻子', 'n'), ('在', 'p'), ('北京', 'ns')]]
import fool
text = ["一个傻子在北京","你好啊"]
words, ners = fool.analysis(text)
print(ners)
#[[(5, 8, 'location', '北京')]]
sys.prefix
, under /usr/local/
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。