1 Star 0 Fork 15

soon14 / 快速中文分词分析word segmentation

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
GPL-2.0

(早期接触申请软著,以为 申请软著就是发表,结果找华夏知识产权写了已经发表,这里永久申明下.)

用户使用如果遇纠纷 法院热线电话 12368, 公安电话号码110

<<100%(首创)个人著作权开源项目 软著登字第3951366号>>🔥26,000/ms word segment for text mining of NLP, POS, AI and Deep learning/每秒中文混合分词2200~3000万词汇的高精准确率快速神经网络分词包. 训练词意分析, 词感分析, 词境分析, 词灵分析并自由扩充词库 免费的官方互动展示页地址: http://tinos.qicp.vip/data.html

项目使用说明书 10.6.1版本地址:

https://github.com/yaoguangluo/AOPM_VPCS_Theroy/blob/master/%E6%B5%8F%E9%98%B3%E5%BE%B7%E5%A1%94%E8%BD%AF%E4%BB%B6%E5%BC%80%E5%8F%91%E6%9C%89%E9%99%90%E5%85%AC%E5%8F%B8%20%E8%AF%AD%E8%A8%80%E5%9B%BE%E7%81%B5%E5%B7%A5%E7%A8%8BAPI%E8%AF%B4%E6%98%8E%E4%B9%A6_10_6_1_5.doc

闭源的6万中文词汇语料库地址:https://github.com/yaoguangluo/deta_corpus/blob/master/posccFull.lyg

实例

版本号:11.1.1 : 4字词卷积催化函数 准备整体卡诺图化简, 和PCA阀门优化. 2019-05-23 字长卷积小表 已经 应用了(新增4表). 2019-05-27 0:11 M

版本号:11.1.0: 随机测试文章来自 360八卦新闻推荐, 腾讯门户, 163门户等,总计110多篇文章, 约5万字, 算法问题导致错误分词1个(错误率十万分之三),词库缺少导致错误7个(错误率万分之三), <中文分析错误率小于亿分之一> 是一个艰巨的主题. 算法问题的扩充 时时更新在 HUB上.2019-05-25 22:48 M

版本号:11.1.0 快速歧义病句混合分词 支持 标点符号分离(因为标点特别多, 未做病句标点分析, 大家可以自由改 2019-05-14) 契形字符, 目前可混合识别 12国语言, 可混合分词70国语言(契形+中(简,繁)日,韩,象形, 无标点,歧义,绕口令,带病句快速混分高质量算法研究同时保证1800万+/每秒混分速度和99.9%分词准确率(deta的科研目标是准确率上99.999999% (中文分析错误率小于亿分之一)) 和商业闭源语料库版(65000+中文简体词汇和35万12国词汇). 20190504

准备添加德塔处理人名的函数. 因为标点符号不是德塔研发设计的, 引用添加在如下另外一工程链接:refer https://github.com/yaoguangluo/Data_Processor/tree/master/DP/NLPProcessor

德塔类人机器人 Tin 先生准备开始工作了.deta机器人Tin先生呢 在0.0.0.0, 到255.255.255.255 的ip集合中一个一个子html页面爬出中文信息进行分词扩充 自己词汇. 非常感谢 各类作文网, 文学论坛, 博客媒体, 新闻门户,提供了准确用词的文章,避免Tin先生分词工作犯错误.

德塔类人机器人 Tin 先生已经学会如何自主分词了,再也不用作者进行一个词一个手工添加了 20190427.生命特征健康,20190428

中文分词算法原理已经公布: https://github.com/yaoguangluo/Deta_Parser/issues/21

主题一: 词意

实例已完成功能: คลังข้อความภาษาไทยขนาดใหญ่สำหรับการปรับรุ่นเสร็จสมบูรณ์。
实例已完成功能: Cơ sở dữ liệu tiếng việt chuẩn hoàn thành, không tối ưu hóa phiên bản。
实例已完成功能: Rumah corpus indonesia selesai. Versi tak dioptimalkan。
实例已完成功能: Die deutsche sprachdatenbank wurde nicht geändert.
实例已完成功能: أُنجزت نسخة غير معدلة من قاعدة المفردات المتخصصة باللغة العربية
实例已完成功能: Versión no detectada del corpus español completa。
实例已完成功能: 한국의 언어 자재 고는 이미 완벽하다。
实例已完成功能: 日本语のデータベースはすでに第1版が完成しました。
实例已完成功能: Le corpus français est terminé A1, A2, A3, A4, B1, B2。
1 : The first unrevised version has been completed: 12 professional level corpora of Chinese, Chinese pinyin, French, German, Korean, Japanese, Spanish, Russian, indonesia , Arabic, Vietnam and Thailand languages.
2 : 第1版未修正版:中国、フランス、ドイツ、韩国、日本、スペイン、ロシア、アラビア语8种类の専门レベルの言语データベースが完成した。
3 : 이미 제1 판의 수정되지 않은 수정판은 중국, 프랑스, 독일, 한국, 일본, 서부, 로씨야, 아랍어 등 8개 전업급 언어자료창고이다.
4 : La première édition n’a pas été modifiée: le corpus des langues chinoise, française, allemande, coréenne, japonaise, occidentale, russe et arabe.
5 : Die erste unänderte fassung der ersten ausgabe wurde abgeschlossen: in der mitte, frankreich, korea, japan, russland, dem 8. Sprachzentrum auf hoher ebene
6 : Завершено первое неисправленное издание: Китай, Франция, Германия, хан, Япония, западная, российская и арабская языки, восемь специализированных корпусов.
7 : Se han completado las primeras ediciones sin modificaciones: el corpus juris de 8 niveles profesionales en idiomas chino, francés, alemán, coreano, japonés, occidental, ruso y árabe.
8 : وقد اكتملت الطبعة الأولى من دون تعديل، وهي مجموعة من ثماني مجموعات متخصصة من اللغات الإسبانية والفرنسية والألمانية والورية واليابانية والغربية والروسية.
9 : Rumah corpus indonesia selesai. Versi tak dioptimalkan。
10 : Cơ sở dữ liệu tiếng việt chuẩn hoàn thành, không tối ưu hóa phiên bản。
11 : คลังข้อความภาษาไทยขนาดใหญ่สำหรับการปรับรุ่นเสร็จสมบูรณ์。
实例已完成功能: 首次采用《VPC架构》海量线程注册保证调用函数速度。 功能作者: 罗瑶光
实例已完成功能: 支持海量并发运算,后端接口调用运算,纯全虚接口同步运算。功能作者: 罗瑶光
实例已完成功能: 经过SONAR 最高级认证(感知最高认证,语义最高认证,语法最高认证,行为最高认证,逻辑最高认证)。功能作者: 罗瑶光
实例已完成功能: 扩展词语非常简单:基于 《格式化线性语料库》。功能作者: 罗瑶光
实例已完成功能: 查询词语非常方便:基于 《离散森林网络加权字典递归索引》。功能作者: 罗瑶光
实例已完成功能: 搜索词语非常迅捷:基于 《2分法搜索 欧基里德距离 进行 位运算散列存储 字符集数据森林》。功能作者: 罗瑶光
实例已完成功能: 匹配词语非常精准:基于 《决策树深度 NLP 正向隐马可夫匹配》。功能作者: 罗瑶光
实例已完成功能: 词频统计接近光速:基于《线性科学最强的快排第6代的基础上作者进行以作者名字命名的小高峰过滤法修正算法,导致快排6的速度再翻2倍》。 (词频统计非线性排序算法已经更新了罗瑶光小高峰过滤快排三代. 2019-04-23)
实例已完成功能: 速度:每秒高达2200万(201904012)中文简体字准确分词。 因为通过国际SONAR最高认证,牺牲了程序执行时间十分之三的速度效率(自行修改去掉sonar认知模式可达3000万字分词每秒,性能比应该是世界第二,世界第一赠给高斯林先生,因为我用的是java,没办法)。 测试环境(win7, 64位, 16g ram,intel i5-7500) 20181208 功能作者: 罗瑶光

https://github.com/yaoguangluo/Deta_Parser/tree/master/wordSegment/org/tinos/test

实例已完成功能: 中英混合分词。最高达到每秒2200万 ~ 2700万中英文混合常规格式分词。(每毫秒分22,000字+)20190412 功能作者:罗瑶光
实例已完成功能: 速度每秒高达900万词语的中文词性索引。(Part Of Speech, POS),功能作者: 罗瑶光
实例已完成功能: 机制为分词和词性分析可拆分使用。采用一次实例,多并发执行思想。功能作者: 罗瑶光
实例已完成功能: 词库:多达26300+的中文语料库精确简体中文词汇,有效的辨别新词。功能作者: 罗瑶光
实例已完成功能: 大小:55Kb。
实例已完成功能: 多核模式:可以自己写 parallelStream() 函数去实现,jdk8以上已经支持, CogsBinaryForestAnalyzer 支持海量多核多线程并发安全 。功能作者: 罗瑶光
实例已完成功能: 安全:VPC架构采用纯虚函数做反向映射跳过IOC,效率增加,线程安全高度严格保障。功能作者: 罗瑶光
实例已完成功能: 部分中文短句翻译英语。功能作者: 罗瑶光
实例已完成功能: 中英混合分词。最高达到每秒2200万 ~ 3000万中英文混合常规格式分词。功能作者: 罗瑶光
实例已完成功能: 病句中乱码分析。功能作者: 罗瑶光
实例已完成功能: VPC进化到VPCS, 静态分流加速每秒又多增100万分词。功能作者: 罗瑶光
实例已完成功能:12国语言翻译词汇录入系统。 Mr.Yaoguang.Luo 20190310 功能作者: 罗瑶光
实例 第一次语料库森林进行序列化优化已完成(分词速度提高1.5%),导致ICA内核生成速度翻倍。20190320 功能作者 罗瑶光。
实例 逐步完善歧义,复杂,病句,变态,狂虐,句型的分词.感谢测试用例病句提供者如下:

https://github.com/yaoguangluo/Deta_Parser/blob/master/wordSegment/org/tinos/test/DemoPOS.java (https://blog.csdn.net/dreamz*************ls/88108568 https://my.oschina.*************135746) 道德清洗中. 对曾经提供负面的歧义病句的单位表示感谢 同时表示道歉,这里 链接过滤了.

主题二: 词感

实例 德塔意识图灵机项目已经启动。 20190313 功能作者: 罗瑶光
实例 ICA 内核训练集生成算法优化。20190317 功能作者: 罗瑶光
实例 基于贝叶斯统计RNN函数集,通过频率排序进行函数校准,并进行动词的特殊用法修正。20190319 功能作者 罗瑶光
实例 一种罗氏教育评估图灵机1.0如下:基于ANN的训练形谓词比的核心率进行贝叶斯结果分析。20190323 功能作者 罗瑶光。
实例已完成功能: 病句分析非常完善:基于 《双向马可夫词性 POS 打分修正策略》。功能作者: 罗瑶光
实例已完成功能:情感语料库第一版本未修正版本。 Mr.Yaoguang.Luo

注意1:该正面,褒义,负面,贬义,中性情感语料库有一定比重的表达作者的主观判断,比如思维误差,肯定环境,否定环境,哲学精神论等,如果引起不适,请慎重使用和借鉴修改。如果该情感库对第三方导致任何工程问题,作者不做任何解释和负法律责任。 注意2: 因为关键字和形谓词模型的应用不确定性,意识和社会形态的溯源问题以及字典理解的误差率,该情感语料库不做任何解释在基于法律与道德的临界线区分应用上。 注意3: 多语意识场合,该情态库不做任何情形分类评估标准,也不做引导性评估。

实例 基于 HMM matrix 进行nomarlization 然后做 未优化的 ANN 简单 训练版本 map reduce 测试。功能作者: 罗瑶光

https://github.com/yaoguangluo/Deta_Parser/blob/master/sensingMap/org/tinos/sensing/test/ANNTest.java

主题三: 词境

实例 基于环境,场合,动机,目的,倾向和预判评估进行自然语言第6感意识分析。 功能作者: 罗瑶光

https://github.com/yaoguangluo/Deta_Parser/blob/master/emotionMap/org/tinos/emotion/test/EnvironmentTest.java

实例 Emotion Ratio Matrix for ANN ICA 6.th sensing test 功能作者: 罗瑶光

https://github.com/yaoguangluo/Deta_Parser/blob/master/sensingMap/org/tinos/sensing/test/SensingTest.java

实例 正在做功能:语言心理学读心术。功能作者: 罗瑶光
实例 正在做功能:动机判断的情态语料库。已经附带可运行实例地址如下。功能作者: 罗瑶光

https://github.com/yaoguangluo/Deta_Parser/blob/master/emotionMap/org/tinos/emotion/test/EmotionTest.java

实例 基于ICA做马可夫行为集合,通过误差容错率进行词性校准,找出副词特殊用法陷阱,并进行了修正。

https://github.com/yaoguangluo/Deta_Parser/blob/master/wordSegment/org/tinos/engine/pos/imp/POSControllerImp.java 20190318 功能作者 罗瑶光

实例 一种用于行为评估的罗氏多文本量子观测角度自适应行为ICA增量训练内核已经初步定义,之后开始做ICA + CNN内核计算.20190316
实例 情感集图灵算子进行认知化。下一步进行带训练集意识加工处理为ICA做预处理。 20190315功能作者: 罗瑶光

主题四: 词灵

实例 一种基于 ANN{Summing, Emotion, Motivation, Environment} * RNN{Covex, Euclid, POS} = DNN{LWA,Entropy} 罗氏读心术已经更新并进行了图灵算子优化。 20190314 功能作者: 罗瑶光

实例

https://github.com/yaoguangluo/Deta_Parser/blob/master/sensingMap/org/tinos/sensing/test/DNNTest.java
http://tinos.qicp.vip/data.html (德塔ANN 维度功能)
http://tinos.qicp.vip/data.html (德塔RNN 向量功能)
http://tinos.qicp.vip/data.html (德塔DNN 读心功能)
实例 接受教育能力行为训练工程已经启动。 20190321 功能作者 罗瑶光。 目标预计每秒分析700万字文章。

https://github.com/yaoguangluo/Deta_Parser/blob/master/behaviorMap/org/tinos/behavior/test/EducationLevelTest.java

实例 12国语言混合快分作文辅导功能.目前支持中, 英, 繁, 简, 日, 韩 作文辅导和评阅功能.

http://tinos.qicp.vip/data.html (作文辅导功能)

实例 一种简洁快速的文章文学周期性价比率研究。20190322 功能作者 罗瑶光。

https://github.com/yaoguangluo/Deta_Parser/blob/master/behaviorMap/org/tinos/behavior/test/LiterarinessLevelTest.java

实例 资本运作,消费评估,购买力分析的商业体系已经启动。 20190324 功能作者 罗瑶光。

商业开发将在官方网站展示:http://tinos.qicp.vip/

实例 Deta 中译英图灵项目已经启动.(谷歌,有道,百度等在语言句子翻译项目 非常成功, 德塔不会将研究重心花在全文翻译领域. 中译英图灵项目主要用在多语意识分析子项目.)

https://github.com/yaoguangluo/Deta_Parser/blob/master/neroMap/org/tinos/test/DemoTSLT.java

功能:

实例正在做功能: Unit test case。
实例正在做功能:商业应用布局。
实例正完善功能 所有代码进行文字描述,方便日后出版。作者要组织文字来描述当年为什么要这样写,写的时候遇到了什么困难, 是怎么解决的,写的时候怎么思考,将来需要怎么优化等。
商品功能:英语复句翻译。实例
商品功能:英语特殊句型翻译。
商品功能:分词矫正识别。
商品功能:多语意识。
商品功能:信贷分析。
商品功能:风险分析。
商品功能:心理辅导。
商品功能:教育辅导。
商品功能:动机分析。
商品功能:股市数据分析.
商品功能:新闻,领导,总结广告用语分析。
商品功能:智力训练。
商品功能:刑事犯罪语录侦察。
商品功能:读心术。
商品功能:防骗术。
商品功能:行为图灵。
商品功能:支持训练。
商品功能:带变异性特征进化。
商品功能:新词搜索互联网更新词库功能。
商品功能:商业对象接口计划。
商品功能:线程分词的内存实时检测。
商品功能:等等。

使用方法:

1 支持 java JDK 8 以上,字符集UTF-8 就够了,不需要任何插件和资源包。

分词使用如下:

大家可以自由添加词汇,添加在 org/tinos/fhmm/imp/words.lyg文件里。语料库集合地址如下:

https://github.com/yaoguangluo/Deta_Parser/tree/master/wordSegment/org/tinos/ortho/fhmm/imp

可以看下org/tinos/test里面的例子。
//1 实例化
Analyzer analyzer = new CogsBinaryForestAnalyzerImp();  //哈希森林索引 多核多线程安全 支持并发
//2初始
	//analyzer.init();
    analyzer.initMixed();
//3 创建字符串 utf 8
String ss = "如果从容易开始于是从容不迫天下等于是非常识时务必为俊杰沿海南方向逃跑他说的确实在理结婚的和尚未结婚的提高产品质量中外科学名著内科学是临床医学的基础    内科学作为临床医学的基础学科,重点论述人体各个系统各种疾病的病因、发病机制、临床表现、诊断、治疗与预防";
//4 执行
 //List<String> sets = analyzer.parserString(ss); 
 List<String> sets = analyzer.parserMixedString(ss); 
//5 输出
int j=0;
	for(int i = 0; i < sets.size(); i++){
		System.out.print(sets.get(i)+" | ");
		j++;
		if(j>25) {
			j=0;
			System.out.println("");
		}
	}
//6 效果

如果 | 从 | 容易 | 开始 | 于是 | 从容不迫 | 天下 | 等于 | 是非 | 常识 | 时务 | 必 | 为 | 俊杰 | 沿 | 海南 | 方向 | 逃跑 | 他 | 说的 | 确实 | 在理 | 结婚 | 的 | 和 | 尚未 | 结婚 | 的 | 提高 | 产品 | 质量 | 中外 |
科学 | 名著 | 内科学 | 是 | 临床 | 医学 | 的 | 基础 | | 内科学 | 作为 | 临床 | 医学 | 的 | 基础 | 学科 |
, | 重点 | 论述 | 人体 | 各个 | 系统 | 各种 | 疾病 | 的 | 病因 | 、 | 发病 | 机制 | 、 | 临床 | 表现 |
、 | 诊断 | 、 | 治疗 | 与 | 预防 |

POS 词性分析如下:

//1 实例化
	//Analyzer analyzer = new CogsBinaryForestAnalyzerImp();  //哈希森林索引 多核多线程安全 支持并发
	Analyzer analyzer = new BinaryForestAnalyzerImp();  //哈希森林索引 单线程
	//Analyzer analyzer = new FastAnalyzerImp();        //快速线性索引 单线程
	//Analyzer analyzer = new PrettyAnalyzerImp();      //线性森林索引 单线程
	//Analyzer analyzer = new BaseAnalyzerImp();        //一元线性索引
	//Analyzer analyzer = new ScoreAnalyzerImp();       //森林打分索引
//2初始
//analyzer.init();
analyzer.initMixed(); 象形 契形混分初始
Map<String, String> pos = analyzer.getWord();
//3 创建字符串 utf 8
String ss = "他说的确实在理结婚的和尚未结婚的提高产品质量中外科学名著内科学是临床医学的基础    内科学作为临床医学的基础学科,重点论述人体各个系统各种疾病的病因、发病机制、临床表现、诊断、治疗与预防";
//4 执行
//List<String> sets = analyzer.parserString(ss); 
List<String> sets = analyzer.parserMixedString(ss); 
//5 输出
int j=0;
	for(int i = 0; i < sets.size(); i++){
		System.out.print(sets.get(i)+"/"+pos.get(sets.get(i)) +"  ");
		j++;
		if(j>8) {
			j=0;
			System.out.println("");
		}
	}
//6 效果:

他/人称代词 说/动词 的 的确/副词 实在/副词 理/形谓词 结婚/动词 的/结构助词 和/连词 尚未/副词
结婚/动词 的/结构助词 提高/动词 产品/名词 质量/名词 中外/名词 科学/名词 名著/名词 内科学/名词
是/动词 临床/名词 医学/名词 的/结构助词 基础/名词 内科学/名词 作为/动词 临床/名词 医学/名词
的/结构助词 基础/名词 学科/名词 ,/标点 重点/名词 论述/名词 人体/名词 各个/限定词 系统/名词
各种/名词 疾病/名词 的/结构助词 病因/名词 、/标点 发病/动词 机制/名词 、/标点 临床/名词
表现/名词 、/标点 诊断/名词 、/标点 治疗/动词 与/连词 预防/动词

感谢声明

1 感谢中国复旦大学的FNLP人工智能团队。 本人在设计数据字典扩充的时候 应用其新词识别函数 帮我节省了大量词语录入需花费的时间。

应用方法:本人用FNLP函数将文章中的词语将我分出词进行词性标注,得到的标注如果在我的词库里面没有出现,于是扩充在我的词库。特此声明。

2 谷歌翻译,百度翻译,有道翻译团队。本人在做多国语言翻译的时候 应用其免费在线翻译网页进行词语翻译和矫正。减少大量词汇录入时间。

特别感谢有道翻译。

代码协作贡献者 (协作者按代码百分比享有项目各种合法权益与收益)

尚无

第三方开源包的引用和修改

尚无

参与讨论者

LetWang(神州泰岳)在扩充词库量的方法上提出了很多新颖的意见。 1 建议我向搜狗等商业公司买词库。 2 建议我和开源的分词公司合作。 3 建议我招聘相关人员录入词库工作。

基于该分词系统的项目实例

实例

有疑问联系313699483@qq.com 作者:罗瑶光
电话 15116110525
谢谢!
2019/3/15
GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. <one line to give the program's name and a brief idea of what it does.> Copyright (C) <year> <name of author> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. <signature of Ty Coon>, 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.

简介

快速中文分词分析word segmentation 展开 收起
Java
GPL-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Java
1
https://gitee.com/soon14/DetaParser.git
git@gitee.com:soon14/DetaParser.git
soon14
DetaParser
快速中文分词分析word segmentation
master

搜索帮助