# attitude-root-taxonomy **Repository Path**: mirrors_UKPLab/attitude-root-taxonomy ## Basic Information - **Project Name**: attitude-root-taxonomy - **Description**: Text classication experiments and UMAP visualizations of A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-06-16 - **Last Updated**: 2026-05-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling -- Text Classification Source code for the text classification experiments from A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling (https://osf.io/e4yp6/) Contact person: Luke Bates, bates@ukp.informatik.tu-darmstadt.de https://www.ukp.tu-darmstadt.de/ https://www.tu-darmstadt.de/ Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions. > This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication. ## Project structure * `main.py` -- code file that uses the other code files * `umap_plots/data/` -- Study 1 and Study 2 data files. * `umap_plots` -- source code and data used for the umap plots from the paper. Please see the README.md file there. ## Requirements Our results were computed in Python 3.6.8 with a 40 GB NVIDIA A100 Tensor Core GPU. Note that files will be written to disk if the code is run. ## Installation To setup, please follow the instructions below. ``` python -m venv mvenv source mvenv/bin/activate pip install -r requirements.txt pip install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html ``` Then, you can run the code with `python main.py`. You can specific which configurations by passing arguments to python. * There are the following modes: * st_baseline (Sentence Transformer + logistic regression) * setfit_sft (SetFit standard fine-tuning) * setfit_zero_shot (SetFit zero-shot) * setfit_tsft (SetFit two-step fine-tuning) * transformer_sft (Transformer standard fine-tuning) * transformer_zero_shot (Transformer zero-shot) * transformer_tsft (Transformer two-step fine-tuning). * You can pass any Sentence Transformer. We used paraphrase-mpnet-base-v2. * You can pass any Transformer. We used roberta-base. * You can choose 11 or 7 roots. The code will not work if you pass a different number. * With the exception of the modes setfit_zero_shot and tranformer_zero_shot, you need to specify a fold between 0-4 for five-fold cross validation. You must run the code with all five folds, only changing this argument (0-4), to reproduce our results. * You can choose the number of epochs. This controls the number of epochs the models will train for in standard fine-tuning and the second step of two-step fine-tuning. * You can choose the number of "pretraining" epochs. This controls the number of epochs the models will train for in zero-shot and first step of two-step fine-tuning. For example, if you wish to reproduce our SetFit standard fine-tuning for 5 epoch results on the first fold with 11 roots: ``` python main.py --FOLD=0\ --ST_MODEL='paraphrase-mpnet-base-v2'\ --NUM_ROOTS=11\ --MODE='setfit_sft'\ --EPOCHS=5\ ``` If you wish to reproduce our SetFit two-step fine-tuning for 15 epochs results on the fifth fold with 7 roots: ``` python main.py --FOLD=4\ --ST_MODEL='paraphrase-mpnet-base-v2'\ --NUM_ROOTS=7\ --MODE='setfit_tsft'\ --EPOCHS=5\ --PRETRAIN_EPOCHS=10\ ``` If you wish to reproduce our SetFit zero-shot results with 11 roots: ``` python main.py --ST_MODEL='paraphrase-mpnet-base-v2'\ --NUM_ROOTS=11\ --MODE='setfit_zero_shot'\ --PRETRAIN_EPOCHS=10\ ``` If you wish to reproduce our roberta-base standard fine-tuning for 15 epochs results with 11 roots on the third fold: ``` python main.py --FOLD=2\ --TRANSFORMER_CLF='roberta-base'\ --NUM_ROOTS=11\ --MODE='transformer_sft'\ --EPOCHS=15\ ``` ### Expected results Once finished, results will be written in the "split_output" folder as json files. The "mac" field is the macro F1 metric for a given fold, while the "ap" field is the sample-average average precision from Supplemental Material for a given fold. You can see a summary by using the `get_results.ipynb` code file in via `jupyter notebook`. Note that, with the exception of the zero-shot results, the code there will not work if you do not complete all five (0-4) folds. This is because, for these modes, we report the average over the five folds for both the macro F1 and sample-average average precision.