# sentiment_analysis_fine_grain **Repository Path**: mirrors_brightmart/sentiment_analysis_fine_grain ## Basic Information - **Project Name**: sentiment_analysis_fine_grain - **Description**: Multi-label Classification with BERT; Fine Grained Sentiment Analysis from AI challenger - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-01-11 - **Last Updated**: 2026-05-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Introduction With this repository, you will able to train Multi-label Classification with BERT, Deploy BERT for online prediction. You can also find the a short tutorial of how to use bert with chinese: BERT short chinese tutorial You can find Introduction to fine grain sentiment from AI Challenger ## Basic Ideas Add something here. ## Experiment on New Models for more, check model/bert_cnn_fine_grain_model.py ## Performance Model | TextCNN(No-pretrain)| TextCNN(Pretrain-Finetuning)| Bert(base_model_zh) | Bert(base_model_zh,pre-train on corpus) --- | --- | --- | ----------- | ----------- F1 Score | 0.678 | 0.685 | ADD A NUMBER HERE | ADD A NUMBER HERE ---------------------------------------------------------------------------------------------- Notice: F1 Score is reported on validation set ## Usage ### Bert for Multi-label Classificaiton [data for fine-tuning and pre-train] export BERT_BASE_DIR=BERT_BASE_DIR/chinese_L-12_H-768_A-12 export TEXT_DIR=TEXT_DIR nohup python run_classifier_multi_labels_bert.py --task_name=sentiment_analysis --do_train=true --do_eval=true --data_dir=$TEXT_DIR --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --max_seq_length=512 --train_batch_size=4 --learning_rate=2e-5 --num_train_epochs=3 --output_dir=./checkpoint_bert & 1.firstly, you need to download pre-trained model from google, and put to a folder(e.g.BERT_BASE_DIR) chinese_L-12_H-768_A-12 from bert 2.secondly, you need to have training data(e.g. train.tsv) and validation data(e.g. dev.tsv), and put it under a folder(e.g.TEXT_DIR ). you can also download data from here data to train bert for AI challenger-Sentiment Analysis. it contains processed data you can run for both fine-tuning on sentiment analysis and pre-train with Bert. it is generated by following this notebook step by step: preprocess_char.ipynb you can generate data by yourself as long as data format is compatible with processor SentimentAnalysisFineGrainProcessor(alias as sentiment_analysis); data format: label1,label2,label3\t here is sentence or sentences\t it only contains two columns, the first one is target(one or multi-labels), the second one is input strings. no need to tokenized. sample:"0_1,1_-2,2_-2,3_-2,4_1,5_-2,6_-2,7_-2,8_1,9_1,10_-2,11_-2,12_-2,13_-2,14_-2,15_1,16_-2,17_-2,18_0,19_-2 浦东五莲路站,老饭店福瑞轩属于上海的本帮菜,交通方便,最近又重新装修,来拨草了,饭店活动满188元送50元钱,环境干净,简单。朋友提前一天来预订包房也没有订到,只有大堂,五点半到店基本上每个台子都客满了,都是附近居民,每道冷菜量都比以前小,味道还可以,热菜烤茄子,炒河虾仁,脆皮鸭,照牌鸡,小牛排,手撕腊味花菜等每道菜都很入味好吃,会员价划算,服务员人手太少,服务态度好,要能团购更好。可以用支付宝方便" check sample data in ./BERT_BASE_DIR folder for more detail, check create_model and SentimentAnalysisFineGrainProcessor from run_classifier.py ### Pre-train Bert model based on open-souced model, then do classification task 1. generate raw data: [ADD SOMETHING HERE] take sure each line is a sentence. between each document there is a blank line. you can find generated data from zip file. use write_pre_train_doc() from preprocess_char.ipynb 1. generate data for pre-train stage using: export BERT_BASE_DIR=./BERT_BASE_DIR/chinese_L-12_H-768_A-12 nohup python create_pretraining_data.py \ --input_file=./PRE_TRAIN_DIR/bert_*_pretrain.txt \ --output_file=./PRE_TRAIN_DIR/tf_examples.tfrecord \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --do_lower_case=True \ --max_seq_length=512 \ --max_predictions_per_seq=60 \ --masked_lm_prob=0.15 \ --random_seed=12345 \ --dupe_factor=5 nohup_pre.out & 2. pre-train model with generated data: python run_pretraining.py 3. fine-tuning python run_classifier.py ### TextCNN 1. download cache file of sentiment analysis(tokens are in word level) 2. train the model: python train_cnn_fine_grain.py cache file of TextCNN model was generate by following steps from preprocess_word.ipynb. it contains everything you need to run TextCNN. it include: processed train/validation/test set; vocabulary of word; a dict map label to index. take train_valid_test_vocab_cache.pik and put it under folder of preprocess_word/ raw data are also included in this zip file. ### Pre-train TextCNN 1. pre-train TextCNN with masked language model python train_cnn_lm.py 2. fine-tuning for TextCNN python train_cnn_fine_grain.py ### Deploy BERT for online prediction with session and feed style you can easily deploy BERT. online prediction with BERT, check more from here ## Reference 1. Bidirectional Encoder Representations from Transformers for Language Understanding 2. google-research/bert 3. pengshuang/AI-Comp 4. AI Challenger 2018 5. Convolutional Neural Networks for Sentence Classification