# NLP

**Repository Path**: daiyizheng_admin/nlp

## Basic Information

- **Project Name**: NLP
- **Description**: 自然语言处理技术栈
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-12-14
- **Last Updated**: 2021-05-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# NLP-tutorial
## 依赖
- pytorch 1.5.1
- tensorflow  2.1
- transformers 3.3.1
- skicit-learn 
## 常用文本基础算法

## 机器学习篇(sklearn)
- 01 [tree](https://gitee.com/daiyizheng/nlp/blob/master/01-sklearn-tutoral/01-Tree/tree.py)
- 02 [randomForest](https://gitee.com/daiyizheng/nlp/blob/master/01-sklearn-tutoral/02-RandomForest/randomForest.py)
- 03 [k-means](https://gitee.com/daiyizheng/nlp/blob/master/01-sklearn-tutoral/03-k-means/k-means.py)
- 04 [SVM](https://gitee.com/daiyizheng/nlp/blob/master/01-sklearn-tutoral/04-SVM/svm-linear.py)
- 05 [XGBoost](https://gitee.com/daiyizheng/nlp/blob/master/01-sklearn-tutoral/05-XGBoost/Xgboost.py)

## tensorflow2.x基础篇
- 01 [数据类型](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/01-数据类型.ipynb)
- 02 [Tensor](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/02-Tensor.ipynb)
- 03 [索引切片](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/03-索引切片.ipynb)
- 04 [维度变换](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/04-维度变换.ipynb)
- 05 [Broadcasting](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/05-Broadcasting.ipynb)
- 06 [数学计算](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/06-数学计算.ipynb)
- 07 [前向传播](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/07-前向传播.ipynb)
- 08 [合并与分割](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/08-合并与分割.ipynb)
- 09 [数据统计](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/09-数据统计.ipynb)
- 10 [张量排序](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/10-张量排序.ipynb)
- 11 [填充和复制](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/11-填充和复制.ipynb)
- 12 [张量与限幅](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/12-张量与限幅.ipynb)
- 13 [高级操作](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/13-高级操作.ipynb)
- 14 [数据加载](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/14-数据加载.ipynb)
- 15 [张量实战](https://gitee.com/daiyizheng/nlp/blob/master/02-tensorflow2-tutorial/15-张量实战.ipynb)


## pytorch 基础篇

## paddlepaddle 基础篇

## 深度学习篇(pytorch, tensorflow)(部分来自于https://github.com/graykode/nlp-tutorial)
1. Basic Embedding Model
  - 1-1. NNLM(Neural Network Language Model)  - Predict Next Word
    - Paper [A Neural Probabilistic Language Model(2003)](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf)
    - Blog  [NNLM 原理](https://daiyizheng.github.io/2020/07/06/nnlm/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/1-1NNLM/NNLM-torch.py)
  - 1-2. Word2Vec(Skip-gram) - Embedding Words and Show Graph
    - Paper [Distributed Representations of Words and Phrases and their Compositionality(2013)](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
    - Blog  [Word2Vec 原理](https://daiyizheng.github.io/2020/07/05/word2vec/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/1-2Word2Vec/Word2Vec-torch.py)
  - 1-3. FastText(Application Level)
    - Paper [Bag of Tricks for Efficient Text Classification(2016)](https://arxiv.org/pdf/1607.01759.pdf)
    - Blog  [Fasttext 原理](https://daiyizheng.github.io/2020/07/26/fasttext/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/1-3FastText/FastText-torch.py)

2. CNN(Convolutional Neural Network)
  - 2-1 TextCNN
    - Paper [Convolutional Neural Networks for Sentence Classification(2014)](http://www.aclweb.org/anthology/D14-1181)
    - Blog  [TextCNN 原理](https://daiyizheng.github.io/2020/08/27/textcnn/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/2-1TextCNN/TextCNN-torch.py)
    
3. RNN(Recurrent Neural Network)
  - 3-1. TextRNN - Predict Next Step
    - Paper [Finding Structure in Time(1990)](http://psych.colorado.edu/~kimlab/Elman1990.pdf)
    - Blog  [RNN 原理](https://daiyizheng.github.io/2020/06/06/rnn/#toc-heading-17)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/3-1TextRNN/TextRNN-torch.py)
  - 3-2. TextLSTM - Autocomplete
    - Paper [LONG SHORT-TERM MEMORY(1997]()
    - Blog  [RNN 原理](https://daiyizheng.github.io/2020/06/06/rnn/#toc-heading-17)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/3-2TextLSTM/TextLSTM-torch.py)
  - 3-3. Bi-LSTM - Predict Next Word in Long Sentence
    - Blog  [RNN 原理](https://daiyizheng.github.io/2020/06/06/rnn/#toc-heading-17)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/3-3Bi-LSTM/Bi-LSTM-torch.py)
    
4. Attention Mechanism
  - 4-1. Seq2Seq - Change Word
    - Paper [Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation(2014)](https://arxiv.org/pdf/1406.1078.pdf)
    - Blog  [Seq2Seq 原理](https://daiyizheng.github.io/2020/08/15/seq2seq/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/4-1Seq2Seq/Seq2Seq-torch.py)
  - 4-2. Seq2Seq with Attention - Translate
    - Paper [Neural Machine Translation by Jointly Learning to Align and Translate(2014)](https://arxiv.org/abs/1409.0473)
    - Blog  [Seq2Seq with Attention 原理](https://daiyizheng.github.io/2020/08/16/seq2seq-with-attention/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/4-2Seq2Seq(Attention)/Seq2Seq(Attention)-torch.py)
  - 4-3. Bi-LSTM with Attention - Binary Sentiment Classification
    - Code  [torch]()
5. Model based on Transformer
  - 5-1. The Transformer - Translate
    - Paper [Attention Is All You Need(2017)](https://arxiv.org/abs/1706.03762)
    - Blog  [Transformer 原理](https://daiyizheng.github.io/2020/08/26/transformer-yuan-li/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/5-1Transformer/Transformer-torch.py)
  - 5-2. BERT - Classification Next Sentence & Predict Masked Tokens
    - Paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(2018)](https://arxiv.org/abs/1810.04805)
    - Blog  [Bert 原理](https://daiyizheng.github.io/2020/08/27/bert-yuan-li/)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/5-2BERT/BERT-torch.py)
  - 5-3. Elmo
    - Paper [Elmo](#)
    - Blog  [Elmo 原理](https://daiyizheng.github.io/#)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/05-nlp-tutorial/5-3Elmo/Elmo-ELMoForManyLangs.py)
## 深度学习BERT衍生篇
- 1. XLNET
    - Paper []()
    - Blog  [](https://daiyizheng.github.io/#)
    - Code  [torch](https://gitee.com/daiyizheng/nlp/blob/master/#)

## 深度学习入门项目
### huaggingface

## NLP进阶