# Cool-NLPCV

**Repository Path**: Mbritney/Cool-NLPCV

## Basic Information

- **Project Name**: Cool-NLPCV
- **Description**: Some Cool NLP and CV Repositories and Solutions （收集NLP中常见任务的开源解决方案、数据集、工具、学习资料等）
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 0
- **Created**: 2021-07-27
- **Last Updated**: 2021-10-15

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Cool-NLPCV （持续更新中...）
Some Cool NLP and CV Repositories and Solutions  
 
Cool-NLP | [Cool-CV](README-CV.md)  

旨在收集NLP中常见任务的开源解决方案、数据集、工具、学习资料、优质博客等，方便学习或快速查找。在此分享出来，供大家参考。欢迎积极分享并Star，谢谢!  
会持续不定时更新，也欢迎加入共同分享。 

所有内容来源于网络，如果有侵权等问题，请及时联系我删除。

1、机器学习&深度学习入门精选  
* [Python-100天从新手到大师](https://github.com/jackfrued/Python-100-Days)
* [斯坦福大学2014（吴恩达）机器学习教程中文笔记](https://github.com/fengdu78/Coursera-ML-AndrewNg-Notes)
* [《统计学习方法》第二版的代码实现](https://github.com/fengdu78/lihang-code)
* [Coursera深度学习教程中文笔记(deeplearning.ai吴恩达)](https://github.com/fengdu78/deeplearning_ai_books)
* [《动手学深度学习》TensorFlow2.0版本](http://zh.d2l.ai/)
* [《动手学深度学习》Pytorch版本](https://github.com/ShusenTang/Dive-into-DL-PyTorch)
* [Deep-learning-with-keras-notebooks](https://github.com/erhwenkuo/deep-learning-with-keras-notebooks)
* [TensorFlow2教程及深度学习入门指南](https://github.com/snowkylin/tensorflow-handbook)  
* [Pytorch模型训练实用教程](https://github.com/TingsongYu/PyTorch_Tutorial)
* [《机器学习》(西瓜书)公式推导解析](https://github.com/datawhalechina/pumpkin-book)
* [数据科学笔记以及资料搜集Data-Science-Notes](https://github.com/fengdu78/Data-Science-Notes)
* [李宏毅《深度强化学习》笔记](https://github.com/datawhalechina/leedeeprl-notes)
* [Pandas中文教程](https://datawhalechina.github.io/joyful-pandas/build/html/%E7%9B%AE%E5%BD%95/ch3.html)
* [各种框架的深度学习环境Docker镜像](https://github.com/ufoym/deepo)

2、词向量&Bert系列预训练模型
* [100+ Chinese Word Vectors上百种预训练中文词向量](https://github.com/Embedding/Chinese-Word-Vectors)  
* [腾讯词向量](https://ai.tencent.com/ailab/nlp/embedding.html)
* [Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)](https://github.com/ymcui/Chinese-BERT-wwm)
* [谷歌官方BERT](https://github.com/google-research/bert)
* [中文ELECTRA预训练模型](https://github.com/ymcui/Chinese-ELECTRA)
* [中文XLNet预训练模型](https://github.com/ymcui/Chinese-XLNet)
* [中文MacBERT预训练模型](https://github.com/ymcui/MacBERT)
* [中文AlBert预训练模型](https://github.com/brightmart/albert_zh)
* [开源预训练语言模型合集](https://github.com/ZhuiyiTechnology/pretrained-models)
* [JD客服对话数据(42G,12亿句子)预训练BERT及WordEmbedding](https://github.com/jd-aig/nlp_baai/tree/master/pretrained_models_and_embeddings)
* [基于词颗粒度的中文WoBERT](https://github.com/ZhuiyiTechnology/WoBERT)
* [高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型](https://github.com/CLUEbenchmark/CLUEPretrainedModels)

3、自然语言处理数据集&数据下载网站
* [任务型对话数据、文本分类、实体识别&词性标注、搜索匹配、推荐系统、百科数据、指代消歧、中文完形填空数据集、中华古诗词数据库、保险行业语料库、汉语拆字字典、中文数据集平台](https://github.com/InsaneLife/ChineseNLPCorpus)
* [情感/观点/评论 倾向性分析、中文命名实体识别、推荐系统、FAQ 问答系统](https://github.com/SophonPlus/ChineseNlpCorpus) 
* [维基百科、新闻语料、百科问答、社区问答、中英翻译语料](https://github.com/brightmart/nlp_chinese_corpus) 
* [中文语言理解测评基准，包括代表性的数据集、基准(预训练)模型、语料库、排行榜](https://github.com/CLUEbenchmark/CLUE)
* [知识图谱的数据集:常识、城市、金融、农业、地理、气象、社交、物联网、医疗、娱乐、生活、商业、出行、科教等](http://openkg.cn/dataset)
* [新冠开放知识图谱](http://openkg.cn/dataset/39801d1b-0b51-4cde-a06c-62def5a70563)
* [《大词林》开源75万核心实体和围绕核心实体的细粒度概念、关系列表](http://openkg.cn/dataset/hit)
* [大规模医疗对话数据集:包含110万医学咨询，400万条医患对话](https://github.com/UCSD-AI4H/Medical-Dialogue-System)
* [新冠及其他类型肺炎中文医疗对话数据集](https://github.com/UCSD-AI4H/COVID-Dialogue)
* [MedQuAD：(英文)医学问答数据集](https://github.com/abachaa/MedQuAD)
* [中文医疗对话数据集Chinese medical dialogue data](https://github.com/Toyhom/Chinese-medical-dialogue-data)
* [大规模中文知识图谱数据](https://github.com/ownthink/KnowledgeGraphData)
* [中文语音语料:说话人约3200个，音频约900小时，文本约113万条，共有约1300万字](https://github.com/KuangDD/zhvoice)
* [THUOCL（THU Open Chinese Lexicon）中文词库](https://github.com/thunlp/THUOCL)
* [面向中文处理的12类、百万规模的语义常用词典，包括34万抽象语义库、34万反义语义库、43万同义语义库等](https://github.com/liuhuanyong/ChineseSemanticKB)
* [百度知道问答语料库，包括超过580万的问题，938万的答案，5800个分类标签](https://github.com/liuhuanyong/MiningZhiDaoQACorpus)
* [公司名语料库、机构名语料库](https://github.com/wainshine/Company-Names-Corpus)
* [中英文NLP数据集](https://github.com/loujie0822/CLUEDatasetSearch)
* [智源数据开放研究中心](https://open.baai.ac.cn/home)
* [百度大脑](https://ai.baidu.com/broad/download)
* [滴滴数据开放计划](https://outreach.didichuxing.com/app-vue/)


4、基于Bert(bert4keras)的各类任务统一框架实现：
* [中文分词、实体识别、文本(情感)分类、阅读理解、标题生成、关系抽取(三元组抽取)、对抗训练、图像描述生成、文本生成](https://github.com/bojone/bert4keras/tree/master/examples)

5、[BAT机器学习面试1000题系列](https://blog.csdn.net/v_JULY_v/article/details/78121924)

6、[Macadam是一个以Tensorflow(Keras)和bert4keras为基础，专注于文本分类、序列标注和关系抽取的自然语言处理工具包，](https://github.com/yongzhuo/Macadam)
* 支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入
* 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法
* 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法

7、论文合集&实战分享
* [NLP相关顶会(如ACL、EMNLP、NAACL、COLING、AAAI、IJCAI)的论文、开源代码项目合集](https://github.com/yizhen20133868/NLP-Conferences-Code)
* [NLP论文多个领域经典、顶会、必读整理分享](https://blog.csdn.net/qq_42189083/article/details/106424610)
* [深度学习模型在各大公司实战落地细节解读,主要包括搜索/推荐/自然语言处理方向](https://github.com/DA-southampton/Tech_Aarticle)

8、实体识别合集
* [基于TF：BERT-BiLSTM-CRF-NER](https://github.com/macanv/BERT-BiLSTM-CRF-NER)
* [基于TF+Pytorch:CLUENER 细粒度命名实体识别](https://github.com/CLUEbenchmark/CLUENER2020)
* [基于Pytorch:Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)](https://github.com/lonePatient/BERT-NER-Pytorch)
* [基于TF：命名实体识别实践与探索](https://github.com/wavewangyue/ner)
* [工业界如何解决NER问题？12个trick，与你分享](https://zhuanlan.zhihu.com/p/152463745)
* [中文NER的正确打开方式: 词汇增强方法总结 (从Lattice LSTM到FLAT)](https://zhuanlan.zhihu.com/p/142615620)
* [支持批并行的LatticeLSTM](https://github.com/LeeSureman/Batch_Parallel_LatticeLSTM)
* [medical_NER - 中文医学知识图谱命名实体识别](https://github.com/pumpkinduo/KnowledgeGraph_NER)
* [BERT/CRF实现的命名实体识别](https://github.com/Louis-udm/NER-BERT-CRF)
* [用预训练语言模型ALBERT做中文NER](https://github.com/ProHiryu/albert-chinese-ner)
* [用 bilstm-crf,bert及相关方法进行序列标注](https://github.com/qiufengyuyi/sequence_tagging)
* [BILSTM+CRF做医疗实体识别,包含医疗NER数据](https://github.com/DengYangyong/medical_entity_recognize)
* [DeepIE:基于深度学习的信息抽取技术](https://github.com/loujie0822/DeepIE)

9、文本(情感)分类
* [基于CNN，RNN 和NLP中预训练模型构建的多个常见的文本分类模型](https://github.com/xiaoxiong74/textClassifier)
* [中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention, DPCNN, Transformer,基于pytorch](https://github.com/649453932/Chinese-Text-Classification-Pytorch)
* [腾讯开源深度学习文本分类工具:NeuralNLP-NeuralClassifier,基于Pytorch](https://github.com/Tencent/NeuralNLP-NeuralClassifier)
* [Keras-TextClassification](https://github.com/yongzhuo/Keras-TextClassification)
* [中文ULMFiT 情感分析 文本分类](https://github.com/bigboNed3/chinese_ulmfit)
* [基于Bert、Xlnet + cnn、lstm、gru的文本分类](https://github.com/zhanlaoban/Transformers_for_Text_Classification)
* [如何解决NLP分类任务的11个关键问题](https://zhuanlan.zhihu.com/p/183852900)
* [文本分类资料综述总结(含代码)](https://github.com/xiaoqian19940510/text-classification-surveys)

10、关系抽取(三元组抽取)
* [基于远监督的中文关系抽取](https://github.com/xiaofei05/Distant-Supervised-Chinese-Relation-Extraction)
* [基于DGCNN和概率图的轻量级信息抽取模型](https://kexue.fm/archives/6671)
* [用bert4keras做三元组抽取](https://kexue.fm/archives/7161)
* [信息抽取冠军方案分享：嵌套NER+关系抽取+实体标准化](https://zhuanlan.zhihu.com/p/326302618)
* [ACL2020信息抽取相关论文汇总](https://blog.csdn.net/qq_42189083/article/details/106424416)
* [Nlp中的实体关系抽取方法总结](https://zhuanlan.zhihu.com/p/77868938)
* [DeepKE:基于 Pytorch 的深度学习中文关系抽取框架](https://github.com/zjunlp/deepke)
* [基于TensorFlow的实体及关系抽取,2019语言与智能技术竞赛信息抽取任务解决方案](https://github.com/yuanxiaosc/Entity-Relation-Extraction)
* [一种级联指针三元组抽取框架](https://github.com/weizhepei/CasRel)
* [事件抽取方法总结(含代码)](https://github.com/xiaoqian19940510/Event-Extraction)
* [DeepIE:基于深度学习的信息抽取技术](https://github.com/loujie0822/DeepIE)


11、文本生成、文本摘要
* [动手做个DialoGPT：基于LM的生成式多轮对话模型](https://kexue.fm/archives/7718)
* [法研杯2020司法摘要:SPACES：“抽取-生成”式长文本摘要（法研杯总结）](https://github.com/bojone/SPACES)

12、阅读理解
* [基于MLM的阅读理解问答](https://kexue.fm/archives/7148)

13、知识图谱
* [基于医药知识图谱的智能问答系统](https://github.com/YeYzheng/KGQA-Based-On-medicine)
* [京东商品知识图谱](https://github.com/liuhuanyong/ProductKnowledgeGraph)
* [军事领域知识图谱问答项目](https://github.com/liuhuanyong/QAonMilitaryKG)
* [百度百科中文页面，抽取三元组信息，构建中文知识图谱](https://github.com/lixiang0/WEB_KG)
* [基于知识图谱的问答系统](https://github.com/fighting41love/funNLP)
* [《知识图谱》课程资料](https://github.com/npubird/KnowledgeGraphCourse)
* [农业知识图谱(AgriKG)：农业领域的信息检索，命名实体识别，关系抽取，智能问答，辅助决策](https://github.com/qq547276542/Agriculture_KnowledgeGraph)
* [知识图谱构建，自动问答，基于kg的自动问答:以疾病为中心的一定规模医药领域知识图谱，并以该知识图谱完成自动问答与分析服务](https://github.com/liuhuanyong/QASystemOnMedicalKG)
* [知识图谱相关学习资料，提供系统化的知识图谱学习路径](https://github.com/husthuke/awesome-knowledge-graph)

14、文本相似度计算(判定)
* [中文问题句子相似度计算比赛及方案汇总](https://github.com/ShuaichiLi/Chinese-sentence-similarity-task)
* [中国法研杯相似案例匹配Top1团队解决方案](https://github.com/GuidoPaul/CAIL2019)
* [常用文本匹配模型tf版本，数据集为QA_corpus](https://github.com/terrifyzhao/text_matching)
* [文本匹配的相关模型DSSM,ESIM,ABCNN,BIMPM等，数据集为LCQMC官方数据](https://github.com/pengming617/text_matching)
* [基于Siamese bilstm模型的相似句子判定模型,提供训练数据集和测试数据集](https://github.com/liuhuanyong/SiameseSentenceSimilarity)

15、Attention(注意力机制)、Transformer
* [深度学习中的注意力模型](https://zhuanlan.zhihu.com/p/37601161)
* [《Attention is All You Need》浅读(简介+代码)](https://kexue.fm/archives/4765)
* [通俗易懂：8大步骤图解注意力机制](https://zhuanlan.zhihu.com/p/94077451)
* [Transformer如戏，全靠Mask](https://kexue.fm/archives/6933)
* [放弃幻想，全面拥抱Transformer：自然语言处理三大特征抽取器（CNN/RNN/TF）比较](https://zhuanlan.zhihu.com/p/54743941)


16、机器人、问答
* [智能客服、聊天机器人的应用和架构、算法分享和介绍](https://github.com/lizhe2004/chatbot-list)
* [微软聊天机器人框架BotFramework](https://github.com/microsoft/botframework)
* [聊天机器人框架RASA](https://github.com/RasaHQ/rasa)
* [GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型](https://github.com/yangjianxin1/GPT2-chitchat)
* [基于金融-司法领域(兼有闲聊性质)的聊天机器人](https://github.com/charlesXu86/Chatbot_CN)
* [基于rasa_nlu，rasa_core，rasa_core_sdk构建的聊天机器人](https://github.com/xiaoxiong74/rasa_chatbot)

17、Embedding系列
* [nlp中的词向量对比：word2vec/glove/fastText/elmo/GPT/bert](https://zhuanlan.zhihu.com/p/56382372)
* [乘风破浪的PTM：两年来预训练模型的技术进展](https://zhuanlan.zhihu.com/p/254821426)
* [万字长文解析词向量(W2C/Fasttext/Glove)](https://zhuanlan.zhihu.com/p/164999424)
* [Embedding入门必读的十篇论文](https://blog.csdn.net/qq_42189083/article/details/106429008)

18、Bert解读系列
* [BERT模型图解](https://cloud.tencent.com/developer/article/1389555)
* [NLP预训练模型：从transformer到albert](https://zhuanlan.zhihu.com/p/85221503)
* [Bert时代的创新（应用篇）：Bert在NLP各领域的应用进展](https://zhuanlan.zhihu.com/p/68446772)
* [从Word Embedding到Bert模型—自然语言处理中的预训练技术发展史](https://zhuanlan.zhihu.com/p/49271699)
* [XLNet:运行机制及和Bert的异同比较](https://zhuanlan.zhihu.com/p/70257427)

19、NLP任务处理合集，包括但不限于词向量、命名实体识别、文本分类、文本生成、文本相似性计算、关系抽取、中文分词、词性标注、情感分析、新词发现、关键词、文本摘要、文本聚类等
* [NLP相关的一些论文及代码, 包括主题模型、词向量、命名实体识别、文本分类、文本生成、文本相似性计算等，涉及到各种与nlp相关的算法，基于keras和tensorflow](https://github.com/msgi/nlp-journey)
* [Jiagu自然语言处理工具 - 以BiLSTM等模型为基础，提供知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类等功能](https://github.com/ownthink/Jiagu)
* [Texthero：文本数据高效处理包，包括预处理、关键词提取、命名实体识别、向量空间分析、文本可视化等](https://github.com/jbesomi/texthero)
* [基于Pytorch的Bert应用，包括命名实体识别、情感分析、文本分类以及文本相似度等](https://github.com/rsanshierli/EasyBert)

20、NLP基础工具包
* [清华THULAC](https://github.com/thunlp/THULAC)
* [HanLP](https://github.com/hankcs/HanLP)
* [哈工大LTP](https://github.com/HIT-SCIR/ltp)
* [Jieba](https://github.com/yanyiwu/cppjieba)
* [NLPIR汉语分词](https://github.com/NLPIR-team/NLPIR)
* [JioNLP：中文NLP任务预处理工具包，准确、高效、零使用门槛](https://github.com/dongrixinyu/JioNLP)
* [Time-Extractor:中文文本时间抽取、时间转换及标准化](https://github.com/xiaoxiong74/Time-Extractor)
* [TexSmart: 文本理解工具与服务](https://ai.tencent.com/ailab/nlp/texsmart/zh/)

21、文本对抗、数据增强、少样本、零样本、半监督
* [TextAttack:一个用于NLP对抗性攻击、数据扩充和模型训练的框架](https://github.com/QData/TextAttack)
* [对抗训练浅谈：意义、方法和思考(附Keras实现)](https://kexue.fm/archives/7234)
* [中文语料的EDA数据增强工具](https://github.com/zhanlaoban/eda_nlp_for_Chinese)
* [一文搞懂NLP中的对抗训练FGSM/FGM/PGD/FreeAT/YOPO/FreeLB/SMART](https://zhuanlan.zhihu.com/p/103593948)
* [NLP中的对抗训练 + PyTorch实现](https://zhuanlan.zhihu.com/p/91269728)
* [BERT的MLM模型也能小样本学习](https://kexue.fm/archives/7764)

22、NLP标注工具或平台
* [BRAT:基于web的文本标注工具](https://github.com/nlplab/brat)
* [YEDDA](https://github.com/jiesutd/YEDDA)
* [MarkTool 基于web的通用文本标注工具,支持大规模实体标注、关系标注、事件标注、文本分类等](https://github.com/FXLP/MarkTool)
* [doccano:一站式文本标注工具](https://github.com/doccano/doccano)

23、NLP面试指南
* [NLP算法面试必备！史上最全！PTMs：NLP预训练模型的全面总结](https://zhuanlan.zhihu.com/p/115014536)
* [NLP/AI面试全记录(持续更新，最全预训练总结)](https://zhuanlan.zhihu.com/p/57153934)
* [机器学习、NLP面试中常考到的知识点和代码实现](https://github.com/NLP-LOVE/ML-NLP)
* [关于Attention和Transformer的灵魂拷问](https://zhuanlan.zhihu.com/p/336606129)

24、人工智能技术系列报告
* [清华大学人工智能技术系列报告](https://reports.aminer.cn/)

25、[国内自然语言处理(NLP)研究组](https://zhuanlan.zhihu.com/p/142465929)

26、语音识别
* [MASR 中文语音识别](https://github.com/nobody132/masr)
* [基于深度学习的中文语音识别系统 A Deep-Learning-Based Chinese Speech Recognition System](https://github.com/nl8590687/ASRT_SpeechRecognition)

27、Seq2Seq
* [无监督编程语言转换(Python、C++、Java)](https://github.com/facebookresearch/TransCoder?utm_source=catalyzex.com)

28、竞赛精选
* [NLP比赛的TOP方案](https://github.com/zhpmatrix/nlp-competitions-list-review)
* [首届中文NL2SQL挑战赛冠军方案](https://github.com/nudtnlp/tianchi-nl2sql-top1)
* [首届中文NL2SQL挑战赛季军方案与代码](https://github.com/beader/tianchi_nl2sql)
* [Kaggle竞赛宝典方案汇总](https://mp.weixin.qq.com/s/2dd8l4MpyI3UzdTSWGthyA)
* [推荐算法竞赛TOP方案合集](https://zhuanlan.zhihu.com/p/317708353)

29、模型蒸馏
* [BERT模型蒸馏完全指南（原理/技巧/代码）](https://zhuanlan.zhihu.com/p/273378905)
* [一个基于PyTorch的NLP知识蒸馏工具包](https://github.com/airaria/TextBrewer)

30、训练技巧
* [神经网络分布式训练、混合精度训练、梯度累加...一文带你优雅地训练大型模型](https://zhuanlan.zhihu.com/p/110278004)
* [BERT预训练实操总结](https://zhuanlan.zhihu.com/p/337212893)

31、竞赛网站
* [阿里云天池](https://tianchi.aliyun.com/)
* [DataFountain](https://www.datafountain.cn/)
* [Biendata competitions](https://www.biendata.xyz/)
* [DC-lab](https://www.dclab.run/index.html)
* [Kaggle](https://www.kaggle.com/)
* [图灵联邦](https://www.turingtopia.com/competitionnew)
* [Flyai](https://www.flyai.com/)
* [Eval](https://eval.ai/web/challenges/list)

32、论文检索下载
* [SCI-HUB](https://sci-hub.se/)
* [arXiv](https://arxiv.org/)
* [卖萌屋学术站](https://arxiv.xixiaoyao.cn/)