# genco **Repository Path**: mmmz2/genco ## Basic Information - **Project Name**: genco - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-16 - **Last Updated**: 2024-10-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # GenCo This is the official code for paper "Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT" # Setup Environment (for reference) Note Alpaca requires tranformers>='4.28.0.dev0', we used transformers 4.28.0.dev0 by cloning from the transformers github. ``` # conda env (or pip install whatever needed) conda env create -f environment.yml pip install -r requirement.txt conda activate cls # config path export BASE_DIR=/path/to/store/data_and_result export CODE_DIR=/code_dir # install GenCo package pip install -e . ``` # Prepare data download agnews, dbpedia and amazon (optionally yelp, imbd) from https://github.com/yumeng5/LOTClass Yahoo answers from https://www.kaggle.com/datasets/soumikrakshit/yahoo-answers-dataset Our label names are in folder data_prompt. # Reference scripts for experiments We provide reference scripts for datasets with multiple classes, which may require more hyperparam tuning compared with binary sentiment classification (Yelp, IMDB or Amazon). agnews ``` export data_name=agnews # prepare and sample data (we only random sample a small subset of training) bash scripts/prep/sample_data.sh $data_name 1000 2000 # generate augmented text by Alpaca (multi gpus for faster inference) bash scripts/gen/generate_context.sh 0,1,2,3 $data_name 1000 t0.8ins chavinlo/alpaca-native 0.8 0.95 # predict pseudo label bash scripts/ginco/aug_zero.sh 0 $data_name 1000 t0.8ins 128 # conditional generation for pseudo labels (can skip) bash scripts/gen/cond_gen_context.sh 0,1,2,3 $data_name 1000 # contrastive self-train (gpu 0 for self-train BERT, gpu 1 for inference Alpaca) bash scripts/genco/cond_aug_selflearn.sh 0,1 $data_name 1000 t0.8ins ``` dbpedia ``` export data_name=dbpedia bash scripts/prep/sample_data.sh $data_name 800 2000 bash scripts/gen/generate_context.sh 0,1,2,3 $data_name 800 t0.8ins chavinlo/alpaca-native 0.8 0.95 bash scripts/ginco/aug_zero.sh 0 $data_name 800 t0.8ins 128 (optionally) bash scripts/gen/cond_gen_context.sh 0,1,2,3 $data_name 800 # contrastive self-train (gpu 0 for self-train BERT, gpu 1 for inference Alpaca) bash scripts/genco/cond_aug_selflearn.sh 0,1 $data_name 800 t0.8ins ``` yahoo answer ``` export data_name=yahoo_answers_csv bash scripts/prep/sample_data.sh $data_name 1500 2000 bash scripts/gen/generate_context.sh 0,1,2,3 $data_name 1500 t0.8ins chavinlo/alpaca-native 0.8 0.95 bash scripts/ginco/aug_zero.sh 0 $data_name 1500 t0.8ins 128 (optionally) bash scripts/gen/cond_gen_context.sh 0,1,2,3 $data_name 1500 # contrastive self-train (gpu 0 for self-train BERT, gpu 1 for inference Alpaca) bash scripts/genco/cond_aug_selflearn.sh 0,1 $data_name 1500 t0.8ins ``` # Citation EACL 2024 ``` @misc{zhang2023generationdriven, title={Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-tuned GPT}, author={Ruohong Zhang and Yau-Shian Wang and Yiming Yang}, year={2023}, eprint={2304.11872}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```