助力高校计算机教育 —— 码云为老师推出免费高校版,高达 200 人的协作团队
Watch Star Fork

狮子的魂 / frisoC

加入码云
与超过 200 万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
Friso是使用C语言开发的一款高性能中文分词器,使用流行的mmseg算法实现。完全基于模块化设计和实现,可以很方便的植入到其他程序中,例如:MySQL,PHP等。同时支持对UTF-8/GBK编码的切分。 https://github.com/lionsoul2014/friso
一键复制 编辑 原始数据 按行查看 历史
friso.ini 1.87 KB dongyado 提交于 2017-08-19 09:12 . update lex_dir path
#friso configuration file.
# do not change the name of the left key.
# @email chenxin619315@gmail.com
# @date 2012-12-20
#
#charset, only UTF8 and GBK support.
#set it with UTF8(0) or GBK(1)
friso.charset = 0
#lexicon directory absolute path.
# the value must end with '/'
#this will tell friso how to find friso.lex.ini configuration file and all the lexicon files.
#
#if it is not start with '/' for linux, or matches no ':' for winnt in its value
# friso will search the friso.lex.ini relative to friso.ini
#absolute path search:
#linux: friso.lex_dir = /c/products/friso/dict/UTF-8/
#Winnt: friso.lex_dir = D:/products/friso/dict/UTF-8/
#relative path search (All system)
friso.lex_dir = ./vendors/dict/UTF-8/
#the maximum matching length.
friso.max_len = 5
#1 for recognition chinese name.
# and 0 for closed it.
friso.r_name = 1
#the maximum length for the cjk words in a
# chinese and english mixed word.
friso.mix_len = 2
#the maxinum length for the chinese last name adron.
friso.lna_len = 1
#append the synonyms words
friso.add_syn = 1
#clear the stopwords or not (1 to open it and 0 to close it)
#@date 2013-06-13
friso.clr_stw = 0
#keep the unrecongized words or not (1 to open it and 0 to close it)
#@date 2013-06-13
friso.keep_urec = 0
#use sphinx output style like 'admire|love|enjoy einsten'
#@date 2013-10-25
friso.spx_out = 0
#start the secondary segmentation for complex english token.
friso.en_sseg = 1
#min length of the secondary segmentation token. (better larger than 1)
friso.st_minl = 2
#default keep punctuations for english token.
friso.kpuncs = @%.#&+
#the threshold value for a char not a part of a chinese name.
friso.nthreshold = 2000000
#default mode for friso.
# 1 : simple mode - simply maxmum matching algorithm.
# 2 : complex mode - four rules of mmseg alogrithm.
# 3 : detect mode - only return the words that the do exists in the lexicon
friso.mode = 2

评论 ( 0 )

你可以在登录后,发表评论

10_float_left_people 10_float_left_close