1 Star 0 Fork 0

Zimny-Ronaldo / HanLP

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

HanLP: Han Language Processing

English | 日本語 | 文档 | 1.x版 | 论坛 | docker | ▶️在线运行

面向生产环境的多语种自然语言处理工具包,基于PyTorch和TensorFlow 2.x双引擎,目标是普及落地最先进的NLP技术。HanLP具备功能完善、精度准确、性能高效、语料时新、架构清晰、可自定义的特点。

demo

借助世界上最大的多语种语料库,HanLP2.1支持包括简繁中英日俄法德在内的104种语言上的10种联合任务:分词(粗分、细分2个标准,强制、合并、校正3种词典模式)、词性标注(PKU、863、CTB、UD四套词性规范)、命名实体识别(PKU、MSRA、OntoNotes三套规范)、依存句法分析(SD、UD规范)、成分句法分析语义依存分析(SemEval16、DM、PAS、PSD四套规范)、语义角色标注词干提取词法语法特征提取抽象意义表示(AMR);以及指代消解语义文本相似度文本风格转换

量体裁衣,HanLP提供RESTfulnative两种API,分别面向轻量级和海量级两种场景。无论何种API何种语言,HanLP接口在语义上保持一致,在代码上坚持开源。

轻量级RESTful API

仅数KB,适合敏捷开发、移动APP等场景。服务器算力有限,匿名用户配额较少,建议申请公益API秘钥auth

Python

pip install hanlp_restful

创建客户端,填入服务器地址和秘钥:

from hanlp_restful import HanLPClient
HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='zh') # auth不填则匿名,zh中文,mul多语种

Golang

安装 go get -u github.com/hankcs/gohanlp@main ,创建客户端,填入服务器地址和秘钥:

HanLP := hanlp.HanLPClient(hanlp.WithAuth(""),hanlp.WithLanguage("zh")) // auth不填则匿名,zh中文,mul多语种

Java

pom.xml中添加依赖:

<dependency>
  <groupId>com.hankcs.hanlp.restful</groupId>
  <artifactId>hanlp-restful</artifactId>
  <version>0.0.7</version>
</dependency>

创建客户端,填入服务器地址和秘钥:

HanLPClient HanLP = new HanLPClient("https://www.hanlp.com/api", null, "zh"); // auth不填则匿名,zh中文,mul多语种

快速上手

无论何种开发语言,调用parse接口,传入一篇文章,得到HanLP精准的分析结果。

HanLP.parse("2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。")

更多功能包括语义相似度、风格转换、指代消解等,请参考文档测试用例

海量级native API

依赖PyTorch、TensorFlow等深度学习技术,适合专业NLP工程师、研究者以及本地海量数据场景。要求Python 3.6至3.9,支持Windows,推荐*nix。可以在CPU上运行,推荐GPU/TPU。

pip install hanlp
  • HanLP每次发布都通过了Linux、macOS和Windows上Python3.6至3.9的单元测试,不存在安装问题。

快速上手

HanLP的工作流程为加载模型然后将其当作函数调用,例如下列联合多任务模型:

import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH) # 世界最大中文语料库
HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。'])

Native API的输入单位为句子,需使用多语种分句模型基于规则的分句函数先行分句。HanLP预训练了十数种任务上的数十个模型并且正在持续迭代语料库与模型,请查看全部模型。RESTful和native两种API的语义设计完全一致,用户可以无缝互换。简洁的接口也支持灵活的参数,常用的技巧有:

  • 灵活的tasks任务调度,任务越少,速度越快。如HanLP('商品和服务', tasks='tok')指定仅执行分词;大多数任务依赖分词,tasks='dep'会执行分词和依存句法分析;而tasks=['pos', 'dep'], skip_tasks='tok*'表示跳过分词仅执行词性标注和依存句法分析,此时需传入单词列表;skip_tasks='tok/fine'表示使用粗分标准分词并执行后续任务。在内存有限的场景下,用户还可以删除不需要的任务达到模型瘦身的效果。
  • 高效的trie树自定义词典,以及强制、合并、校正3种规则,请参考demo文档。规则系统的效果将无缝应用到后续统计模型,从而快速适应新领域。请参考demo了解更多用法。

输出格式

无论何种API何种开发语言何种自然语言,HanLP的输出统一为json格式的Document:

{
  "tok/fine": [
    ["2021年", "HanLPv2.1", "为", "生产", "环境", "带来", "次", "世代", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"],
    ["阿婆主", "来到", "北京", "立方庭", "参观", "自然", "语义", "科技", "公司", "。"]
  ],
  "tok/coarse": [
    ["2021年", "HanLPv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进", "的", "多语种", "NLP", "技术", "。"],
    ["阿婆主", "来到", "北京立方庭", "参观", "自然语义科技公司", "。"]
  ],
  "pos/ctb": [
    ["NT", "NR", "P", "NN", "NN", "VV", "JJ", "NN", "AD", "JJ", "DEG", "CD", "NN", "NR", "NN", "PU"],
    ["NN", "VV", "NR", "NR", "VV", "NN", "NN", "NN", "NN", "PU"]
  ],
  "pos/pku": [
    ["t", "nx", "p", "vn", "n", "v", "b", "n", "d", "a", "u", "a", "n", "nx", "n", "w"],
    ["n", "v", "ns", "ns", "v", "n", "n", "n", "n", "w"]
  ],
  "pos/863": [
    ["nt", "w", "p", "v", "n", "v", "a", "nt", "d", "a", "u", "a", "n", "ws", "n", "w"],
    ["n", "v", "ns", "n", "v", "n", "n", "n", "n", "w"]
  ],
  "ner/pku": [
    [],
    [["北京立方庭", "ns", 2, 4], ["自然语义科技公司", "nt", 5, 9]]
  ],
  "ner/msra": [
    [["2021年", "DATE", 0, 1], ["HanLPv2.1", "ORGANIZATION", 1, 2]],
    [["北京", "LOCATION", 2, 3], ["立方庭", "LOCATION", 3, 4], ["自然语义科技公司", "ORGANIZATION", 5, 9]]
  ],
  "ner/ontonotes": [
    [["2021年", "DATE", 0, 1], ["HanLPv2.1", "ORG", 1, 2]],
    [["北京立方庭", "FAC", 2, 4], ["自然语义科技公司", "ORG", 5, 9]]
  ],
  "srl": [
    [[["2021年", "ARGM-TMP", 0, 1], ["HanLPv2.1", "ARG0", 1, 2], ["为生产环境", "ARG2", 2, 5], ["带来", "PRED", 5, 6], ["次世代最先进的多语种NLP技术", "ARG1", 6, 15]], [["最", "ARGM-ADV", 8, 9], ["先进", "PRED", 9, 10], ["技术", "ARG0", 14, 15]]],
    [[["阿婆主", "ARG0", 0, 1], ["来到", "PRED", 1, 2], ["北京立方庭", "ARG1", 2, 4]], [["阿婆主", "ARG0", 0, 1], ["参观", "PRED", 4, 5], ["自然语义科技公司", "ARG1", 5, 9]]]
  ],
  "dep": [
    [[6, "tmod"], [6, "nsubj"], [6, "prep"], [5, "nn"], [3, "pobj"], [0, "root"], [8, "amod"], [15, "nn"], [10, "advmod"], [15, "rcmod"], [10, "assm"], [13, "nummod"], [15, "nn"], [15, "nn"], [6, "dobj"], [6, "punct"]],
    [[2, "nsubj"], [0, "root"], [4, "nn"], [2, "dobj"], [2, "conj"], [9, "nn"], [9, "nn"], [9, "nn"], [5, "dobj"], [2, "punct"]]
  ],
  "sdp": [
    [[[6, "Time"]], [[6, "Exp"]], [[5, "mPrep"]], [[5, "Desc"]], [[6, "Datv"]], [[13, "dDesc"]], [[0, "Root"], [8, "Desc"], [13, "Desc"]], [[15, "Time"]], [[10, "mDegr"]], [[15, "Desc"]], [[10, "mAux"]], [[8, "Quan"], [13, "Quan"]], [[15, "Desc"]], [[15, "Nmod"]], [[6, "Pat"]], [[6, "mPunc"]]],
    [[[2, "Agt"], [5, "Agt"]], [[0, "Root"]], [[4, "Loc"]], [[2, "Lfin"]], [[2, "ePurp"]], [[8, "Nmod"]], [[9, "Nmod"]], [[9, "Nmod"]], [[5, "Datv"]], [[5, "mPunc"]]]
  ],
  "con": [
    ["TOP", [["IP", [["NP", [["NT", ["2021年"]]]], ["NP", [["NR", ["HanLPv2.1"]]]], ["VP", [["PP", [["P", ["为"]], ["NP", [["NN", ["生产"]], ["NN", ["环境"]]]]]], ["VP", [["VV", ["带来"]], ["NP", [["ADJP", [["NP", [["ADJP", [["JJ", ["次"]]]], ["NP", [["NN", ["世代"]]]]]], ["ADVP", [["AD", ["最"]]]], ["VP", [["JJ", ["先进"]]]]]], ["DEG", ["的"]], ["NP", [["QP", [["CD", ["多"]]]], ["NP", [["NN", ["语种"]]]]]], ["NP", [["NR", ["NLP"]], ["NN", ["技术"]]]]]]]]]], ["PU", ["。"]]]]]],
    ["TOP", [["IP", [["NP", [["NN", ["阿婆主"]]]], ["VP", [["VP", [["VV", ["来到"]], ["NP", [["NR", ["北京"]], ["NR", ["立方庭"]]]]]], ["VP", [["VV", ["参观"]], ["NP", [["NN", ["自然"]], ["NN", ["语义"]], ["NN", ["科技"]], ["NN", ["公司"]]]]]]]], ["PU", ["。"]]]]]]
  ]
}

特别地,Python RESTful和native API支持基于等宽字体的可视化,能够直接将语言学结构在控制台内可视化出来:

HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。']).pretty_print()

Dep Tree    	Token    	Relati	PoS	Tok      	NER Type        	Tok      	SRL PA1     	Tok      	SRL PA2     	Tok      	PoS    3       4       5       6       7       8       9 
────────────	─────────	──────	───	─────────	────────────────	─────────	────────────	─────────	────────────	─────────	─────────────────────────────────────────────────────────
 ┌─────────►	2021    	tmod  	NT 	2021    	───►DATE        	2021    	───►ARGM-TMP	2021    	            	2021    	NT ───────────────────────────────────────────►NP ───┐   
 │┌────────►	HanLPv2.1	nsubj 	NR 	HanLPv2.1	───►ORGANIZATION	HanLPv2.1	───►ARG0    	HanLPv2.1	            	HanLPv2.1	NR ───────────────────────────────────────────►NP────┤   
 ││┌─►┌─────	        	prep  	P  	        	                	        	◄─┐         	        	            	        	P ───────────┐                                          
 │││    ┌─►	生产       	nn    	NN 	生产       	                	生产       	  ├►ARG2    	生产       	            	生产       	NN ──┐       ├────────────────────────►PP ───┐          
 │││  └─►└──	环境       	pobj  	NN 	环境       	                	环境       	◄─┘         	环境       	            	环境       	NN ──┴►NP ───┘                                         
┌┼┴┴────────	带来       	root  	VV 	带来       	                	带来       	╟──►PRED    	带来       	            	带来       	VV ──────────────────────────────────┐                 
││       ┌─►	        	amod  	JJ 	        	                	        	◄─┐         	        	            	        	JJ ───►ADJP──┐                              ├►VP────┤   
││  ┌───►└──	世代       	nn    	NN 	世代       	                	世代       	           	世代       	            	世代       	NN ───►NP ───┴►NP ───┐                                
││      ┌─►	        	advmod	AD 	        	                	        	           	        	───►ARGM-ADV	        	AD ───────────►ADVP──┼►ADJP──┐       ├►VP ───┘       ├►IP
││  │┌──►├──	先进       	rcmod 	JJ 	先进       	                	先进       	           	先进       	╟──►PRED    	先进       	JJ ───────────►VP ───┘                                
││  ││   └─►	        	assm  	DEG	        	                	        	  ├►ARG1    	        	            	        	DEG──────────────────────────┤                         
││  ││   ┌─►	        	nummod	CD 	        	                	        	           	        	            	        	CD ───►QP ───┐               ├►NP ───┘                  
││  ││┌─►└──	语种       	nn    	NN 	语种       	                	语种       	           	语种       	            	语种       	NN ───►NP ───┴────────►NP────┤                          
││  │││  ┌─►	NLP      	nn    	NR 	NLP      	                	NLP      	           	NLP      	            	NLP      	NR ──┐                                                 
│└─►└┴┴──┴──	技术       	dobj  	NN 	技术       	                	技术       	◄─┘         	技术       	───►ARG0    	技术       	NN ──┴────────────────►NP ───┘                          
└──────────►	        	punct 	PU 	        	                	        	            	        	            	        	PU ──────────────────────────────────────────────────┘   

Dep Tree    	Tok	Relat	Po	Tok	NER Type        	Tok	SRL PA1 	Tok	SRL PA2 	Tok	Po    3       4       5       6 
────────────	───	─────	──	───	────────────────	───	────────	───	────────	───	────────────────────────────────
         ┌─►	阿婆主	nsubj	NN	阿婆主	                	阿婆主	───►ARG0	阿婆主	───►ARG0	阿婆主	NN───────────────────►NP ───┐   
┌┬────┬──┴──	来到 	root 	VV	来到 	                	来到 	╟──►PRED	来到 	        	来到 	VV──────────┐                  
││      ┌─►	北京 	nn   	NR	北京 	───►LOCATION    	北京 	◄─┐     	北京 	        	北京 	NR──┐       ├►VP ───┐          
││    └─►└──	立方庭	dobj 	NR	立方庭	───►LOCATION    	立方庭	◄─┴►ARG1	立方庭	        	立方庭	NR──┴►NP ───┘                 
│└─►┌───────	参观 	conj 	VV	参观 	                	参观 	        	参观 	╟──►PRED	参观 	VV──────────┐       ├►VP────┤   
     ┌───►	自然 	nn   	NN	自然 	◄─┐             	自然 	        	自然 	◄─┐     	自然 	NN──┐                     ├►IP
     │┌──►	语义 	nn   	NN	语义 	               	语义 	        	语义 	       	语义 	NN         ├►VP ───┘          
     ││┌─►	科技 	nn   	NN	科技 	  ├►ORGANIZATION	科技 	        	科技 	  ├►ARG1	科技 	NN  ├►NP ───┘                  
   └─►└┴┴──	公司 	dobj 	NN	公司 	◄─┘             	公司 	        	公司 	◄─┘     	公司 	NN──┘                          
└──────────►	  	punct	PU	  	                	  	        	  	        	  	PU──────────────────────────┘   

关于标注集含义,请参考《语言学标注规范》《格式规范》。我们购买、标注或采用了世界上量级最大、种类最多的语料库用于联合多语种多任务学习,所以HanLP的标注集也是覆盖面最广的。

训练你自己的领域模型

写深度学习模型一点都不难,难的是复现较高的准确率。下列代码展示了如何在sighan2005 PKU语料库上花6分钟训练一个超越学术界state-of-the-art的中文分词模型。

tokenizer = TransformerTaggingTokenizer()
save_dir = 'data/model/cws/sighan2005_pku_bert_base_96.70'
tokenizer.fit(
    SIGHAN2005_PKU_TRAIN_ALL,
    SIGHAN2005_PKU_TEST,  # Conventionally, no devset is used. See Tian et al. (2020).
    save_dir,
    'bert-base-chinese',
    max_seq_len=300,
    char_level=True,
    hard_constraint=True,
    sampler_builder=SortingSamplerBuilder(batch_size=32),
    epochs=3,
    adam_epsilon=1e-6,
    warmup_steps=0.1,
    weight_decay=0.01,
    word_dropout=0.1,
    seed=1609836303,
)
tokenizer.evaluate(SIGHAN2005_PKU_TEST, save_dir)

其中,由于指定了随机数种子,结果一定是96.70。不同于那些虚假宣传的学术论文或商业项目,HanLP保证所有结果可复现。如果你有任何质疑,我们将当作最高优先级的致命性bug第一时间排查问题。

请参考demo了解更多训练脚本。

性能

lang corpora model tok pos ner dep con srl sdp lem fea amr
fine coarse ctb pku 863 ud pku msra ontonotes SemEval16 DM PAS PSD
mul UD2.7
OntoNotes5
small 98.62 - - - - 93.23 - - 74.42 79.10 76.85 70.63 - 91.19 93.67 85.34 87.71 84.51 -
base 98.97 - - - - 90.32 - - 80.32 78.74 71.23 73.63 - 92.60 96.04 81.19 85.08 82.13 -
zh open small 97.25 - 96.66 - - - - - 95.00 84.57 87.62 73.40 84.57 - - - - - -
base 97.50 - 97.07 - - - - - 96.04 87.11 89.84 77.78 87.11 - - - - - -
close small 96.70 95.93 96.87 97.56 95.05 - 96.22 95.74 76.79 84.44 88.13 75.81 74.28 - - - - - -
base 97.52 96.44 96.99 97.59 95.29 - 96.48 95.72 77.77 85.29 88.57 76.52 73.76 - - - - - -
ernie 96.95 97.29 96.76 97.64 95.22 - 97.31 96.47 77.95 85.67 89.17 78.51 74.10 - - - - - -
  • AMR论文已被接收,正在训练中文模型。

HanLP采用的数据预处理与拆分比例与流行方法未必相同,比如HanLP采用了完整版的MSRA命名实体识别语料,而非大众使用的阉割版;HanLP使用了语法覆盖更广的Stanford Dependencies标准,而非学术界沿用的Zhang and Clark (2008)标准;HanLP提出了均匀分割CTB的方法,而不采用学术界不均匀且遗漏了51个黄金文件的方法。HanLP开源了一整套语料预处理脚本与相应语料库,力图推动中文NLP的透明化。

总之,HanLP只做我们认为正确、先进的事情,而不一定是流行、权威的事情。

引用

如果你在研究中使用了HanLP,请按如下格式引用:

@inproceedings{he-choi-2021-stem,
    title = "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders",
    author = "He, Han and Choi, Jinho D.",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.451",
    pages = "5555--5577",
    abstract = "Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.",
}

License

源代码

HanLP源代码的授权协议为 Apache License 2.0,可免费用做商业用途。请在产品说明中附加HanLP的链接和授权协议。HanLP受版权法保护,侵权必究。

自然语义(青岛)科技有限公司

HanLP从v1.7版起独立运作,由自然语义(青岛)科技有限公司作为项目主体,主导后续版本的开发,并拥有后续版本的版权。

大快搜索

HanLP v1.3~v1.65版由大快搜索主导开发,继续完全开源,大快搜索拥有相关版权。

上海林原公司

HanLP 早期得到了上海林原公司的大力支持,并拥有1.28及前序版本的版权,相关版本也曾在上海林原公司网站发布。

预训练模型

机器学习模型的授权在法律上没有定论,但本着尊重开源语料库原始授权的精神,如不特别说明,HanLP的多语种模型授权沿用CC BY-NC-SA 4.0,中文模型授权为仅供研究与教学使用。

References

https://hanlp.hankcs.com/docs/references.html

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理 展开 收起
Python 等 2 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/zimnyronaldo/HanLP.git
git@gitee.com:zimnyronaldo/HanLP.git
zimnyronaldo
HanLP
HanLP
doc-zh

搜索帮助