# MUKI **Repository Path**: xuyangyan/MUKI ## Basic Information - **Project Name**: MUKI - **Description**: [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-04-14 - **Last Updated**: 2023-04-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Model Uncertainty--aware Knowledge Integration (MUKI) [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models ## Setup We recommend to use virtual environment for re-producing the results. ```bash conda create -n muki python=3.7.10 conda activate muki conda install pytorch torchvision cudatoolkit=11.0 -c pytorch pip install -r requirements.txt ``` ## Train Teacher Models The main setup of our paper is to train two teacher models specialized in different class subsets of a classification problem. Take the THU-CNews as an example, run the following command to obtain two teacher models: ```bash bash scripts/train_teacher.sh ``` ## Knowledge Integration After the training of teacher model finished, we can perform knowledge intergration via various distillation methods. ```bash # vanilla KD bash scripts/vkd.sh # UHC bash scripts/uhc.sh # DFA bash scripts/dfa.sh # MUKI(Ours) bash scripts/muki.sh ``` For our methods MUKI, please check the script and corresponding model file `models/uka_multiple_teacher.py` for more details. For the Monte-Carlo dropout, to reduce the computation of uncertainty estimation, we pre-compute the scores and saved it into a numpy file (see `models/monte_carlo.py` for details). The integration can be conducted by reading the corresponding files to accelerate training. We provide the corresponding weights in [Google Drive](https://drive.google.com/file/d/1l_p_WStrMP_zGkEp77I8cH_gvwfrd0Ya/view?usp=sharing). It can also be achieved by compute the uncertainty on-the-fly for your own custome dataset, by adding code like below: ```python with torch.no_grad(): # Monte Carlo Dropout on the fly probs = [] for m in range(self.mc_number): # monte carlo dropout number for i, t_model in enumerate(self.teachers): t_model.train() # activate dropout teacher_output = t_model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, position_ids=position_ids, head_mask=head_mask, inputs_embeds=inputs_embeds, output_attentions=output_attentions, output_hidden_states=False, return_dict=return_dict, ) # bsz, seq_len, logits teacher_logit = teacher_output[0] teacher_prob = F.softmax(teacher_logit, dim=-1) # print(teacher_prob) if m == 0: probs.append(teacher_prob) # else: probs[i] += teacher_prob probs = [prob / self.mc_number for prob in probs] # get the logits t_model.eval() for i, t_model in enumerate(self.teachers): teacher_output = t_model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, position_ids=position_ids, head_mask=head_mask, inputs_embeds=inputs_embeds, output_attentions=output_attentions, output_hidden_states=False, return_dict=return_dict, ) t_logit = teacher_output[0] # bsz, seq_len, logits t_logits.append(t_logit) teacher_probs = [F.softmax(t_logit / self.kd_temperature, dim=-1) for t_logit in t_logits] ``` If you want to compute the teacher score file, please check the `scripts/md_cnews.sh` and `models/monte_carlo.py` for more details.