# attitude-root-taxonomy

**Repository Path**: mirrors_UKPLab/attitude-root-taxonomy

## Basic Information

- **Project Name**: attitude-root-taxonomy
- **Description**: Text classication experiments and UMAP visualizations of A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-06-16
- **Last Updated**: 2026-05-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling -- Text Classification

Source code for the text classification experiments from A Psychological Taxonomy of Anti-Vaccination Arguments: Systematic Literature Review and Text Modeling (https://osf.io/e4yp6/)

Contact person: Luke Bates, bates@ukp.informatik.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/


Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

## Project structure
* `main.py` -- code file that uses the other code files
* `umap_plots/data/` -- Study 1 and Study 2 data files.
* `umap_plots` -- source code and data used for the umap plots from the paper. Please see the README.md file there.


## Requirements
Our results were computed in Python 3.6.8 with a 40 GB NVIDIA A100 Tensor Core GPU. Note that files will be written to disk if the code is run.


## Installation
To setup, please follow the instructions below.
```
python -m venv mvenv
source mvenv/bin/activate
pip install -r requirements.txt
pip install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
```
 
Then, you can run the code with `python main.py`. You can specific which configurations by passing arguments to python. 
* There are the following modes: 
    * st_baseline (Sentence Transformer + logistic regression)
    * setfit_sft (SetFit standard fine-tuning)
    * setfit_zero_shot (SetFit zero-shot)
    * setfit_tsft (SetFit two-step fine-tuning)
    * transformer_sft (Transformer standard fine-tuning)
    * transformer_zero_shot (Transformer zero-shot)
    * transformer_tsft (Transformer two-step fine-tuning).
* You can pass any Sentence Transformer. We used paraphrase-mpnet-base-v2.
* You can pass any Transformer. We used roberta-base.
* You can choose 11 or 7 roots. The code will not work if you pass a different number.
* With the exception of the modes setfit_zero_shot and tranformer_zero_shot, you need to specify a fold between 0-4 for five-fold cross validation. You must run the code with all five folds, only changing this argument (0-4), to reproduce our results.
* You can choose the number of epochs. This controls the number of epochs the models will train for in standard fine-tuning and the second step of two-step fine-tuning.
* You can choose the number of "pretraining" epochs. This controls the number of epochs the models will train for in zero-shot and first step of two-step fine-tuning.

For example, if you wish to reproduce our SetFit standard fine-tuning for 5 epoch results on the first fold with 11 roots:
```
python main.py --FOLD=0\
               --ST_MODEL='paraphrase-mpnet-base-v2'\
               --NUM_ROOTS=11\
               --MODE='setfit_sft'\
               --EPOCHS=5\
```
If you wish to reproduce our SetFit two-step fine-tuning for 15 epochs results on the fifth fold with 7 roots:
```
python main.py --FOLD=4\
               --ST_MODEL='paraphrase-mpnet-base-v2'\
               --NUM_ROOTS=7\
               --MODE='setfit_tsft'\
               --EPOCHS=5\
               --PRETRAIN_EPOCHS=10\
```
If you wish to reproduce our SetFit zero-shot results with 11 roots:
```
python main.py --ST_MODEL='paraphrase-mpnet-base-v2'\
               --NUM_ROOTS=11\
               --MODE='setfit_zero_shot'\
               --PRETRAIN_EPOCHS=10\
```
If you wish to reproduce our roberta-base standard fine-tuning for 15 epochs results with 11 roots on the third fold:
```
python main.py --FOLD=2\
               --TRANSFORMER_CLF='roberta-base'\
               --NUM_ROOTS=11\
               --MODE='transformer_sft'\
               --EPOCHS=15\
```

### Expected results
Once finished, results will be written in the "split_output" folder as json files. The "mac" field is the macro F1 metric for a given fold, while the "ap" field is the sample-average average precision from Supplemental Material for a given fold. You can see a summary by using the `get_results.ipynb` code file in via `jupyter notebook`. Note that, with the exception of the zero-shot results, the code there will not work if you do not complete all five (0-4) folds. This is because, for these modes, we report the average over the five folds for both the macro F1 and sample-average average precision.