8月18日(周六)成都源创会火热报名中,四位一线行业大牛与你面对面,探讨区块链技术热潮下的冷思考。
Watch Star Fork

狮子的魂 / frisoC

Sign up for free
Explore and code with more than 2 million developers,Free private repositories !:)
Sign up
Friso是使用C语言开发的一款高性能中文分词器,使用流行的mmseg算法实现。完全基于模块化设计和实现,可以很方便的植入到其他程序中,例如:MySQL,PHP等。同时支持对UTF-8/GBK编码的切分。 https://github.com/lionsoul2014/friso
Copy Edit Raw Blame History
friso.ini 1.87 KB dongyado authored 2017-08-19 09:12 . update lex_dir path
#friso configuration file.
# do not change the name of the left key.
# @email chenxin619315@gmail.com
# @date 2012-12-20
#
#charset, only UTF8 and GBK support.
#set it with UTF8(0) or GBK(1)
friso.charset = 0
#lexicon directory absolute path.
# the value must end with '/'
#this will tell friso how to find friso.lex.ini configuration file and all the lexicon files.
#
#if it is not start with '/' for linux, or matches no ':' for winnt in its value
# friso will search the friso.lex.ini relative to friso.ini
#absolute path search:
#linux: friso.lex_dir = /c/products/friso/dict/UTF-8/
#Winnt: friso.lex_dir = D:/products/friso/dict/UTF-8/
#relative path search (All system)
friso.lex_dir = ./vendors/dict/UTF-8/
#the maximum matching length.
friso.max_len = 5
#1 for recognition chinese name.
# and 0 for closed it.
friso.r_name = 1
#the maximum length for the cjk words in a
# chinese and english mixed word.
friso.mix_len = 2
#the maxinum length for the chinese last name adron.
friso.lna_len = 1
#append the synonyms words
friso.add_syn = 1
#clear the stopwords or not (1 to open it and 0 to close it)
#@date 2013-06-13
friso.clr_stw = 0
#keep the unrecongized words or not (1 to open it and 0 to close it)
#@date 2013-06-13
friso.keep_urec = 0
#use sphinx output style like 'admire|love|enjoy einsten'
#@date 2013-10-25
friso.spx_out = 0
#start the secondary segmentation for complex english token.
friso.en_sseg = 1
#min length of the secondary segmentation token. (better larger than 1)
friso.st_minl = 2
#default keep punctuations for english token.
friso.kpuncs = @%.#&+
#the threshold value for a char not a part of a chinese name.
friso.nthreshold = 2000000
#default mode for friso.
# 1 : simple mode - simply maxmum matching algorithm.
# 2 : complex mode - four rules of mmseg alogrithm.
# 3 : detect mode - only return the words that the do exists in the lexicon
friso.mode = 2

Comment ( 0 )

You need to Sign in for post a comment