# EfficientFT

**Repository Path**: mirrors_XiaoMi/EfficientFT

## Basic Information

- **Project Name**: EfficientFT
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-16
- **Last Updated**: 2026-03-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# RUC & Xiaomi: Efficient Fine-Tuning 🙌🎉

## 📰 News
- 2025-4-29: Our paper has been accepted by IJCAI-25. Congratulations!
- 2025-3-31: Delivery of a Prototype System for Parameter-Efficient and Gradient Projection Methods: A Comprehensive Benchmark Against 10+ State-of-the-Art Efficient Fine-Tuning Approaches.
- 2024-12-30: Theoretical Insights into Fine-Tuning Attention Mechanism.

## 🎯 Introduction and Target

(1) **Our insights** ([paper](https://arxiv.org/abs/2410.02247), in progress):

According to the traditional statistical learning viewpoint, performance can be defined by the sum of optimization error and generalization error. In (generalization, storage-friendly), we give **Theorem 1** (Information-theoretic genralization bounds), showing that with the same $r$ value, fine-tuning $\mathbf{W}_q,\mathbf{W}_v$ consistently achieves results comparable to or even surpassing those of fine-tuning $\mathbf{W}_q,\mathbf{W}_k,\mathbf{W}_v$. This reduces the number of parameters for the same $r$, while improving generalization bounds and potentially providing memory benefits. In (optimization, time-friendly), we discuss the learning dynamics in fine-tuning attention mechanism, and we illustrate **Theorem 2** that the feature learning of attention mechanism is efficient when the learning rate for $\mathbf{W}_v$ should be generally much larger than that of $\mathbf{W}_q,\mathbf{W}_k$ in fine-tuning. Building on our experimental and theoretical insights, one can develop new algorithms to improve the effectiveness (e.g., storage, and time) of fine-tuning.

![theorem1](./figs/theorem1.jpg)

![theorem2](./figs/theorem2.jpg)

(2) **Target:**

$\text{\textcolor{blue}{This project conducts comprehensive benchmarking of the following 10+ efficient fine-tuning methods.}}$

**Notably, our proposed approach maintains orthogonal compatibility and can be synergistically combined with any of these methods.**

## 📖 10+ efficient fine-tuning methods

- [LoRA](https://openreview.net/forum?id=nZeVKeeFYf9) (ICLR 2022)
- [AdaLoRA](https://openreview.net/forum?id=lq62uWRJjiY) (ICLR 2023)
- [DoRA](https://arxiv.org/abs/2402.09353) (ICML Oral)
- [PiSSA](https://openreview.net/forum?id=6ZBHIEtdP4) (NeurIPS 2024)
- [rsLoRA](https://arxiv.org/abs/2312.03732)
- [OLoRA](https://arxiv.org/abs/2406.01775)
- [EVA](https://arxiv.org/abs/2410.07170)
- [IA3](https://arxiv.org/abs/2205.05638)
- [SIFT](https://arxiv.org/abs/2312.11875) (ICML 2024)
- [Galore](https://arxiv.org/abs/2403.03507) (ICML 2024 Oral)

## ⚙️ Install

1. To install the experiment, please install the pip file.

```
pip install -r requirements.txt
```

2. (Optional) For SIFT&Galore

```
git clone git@github.com:song-wx/SIFT.git
cd SIFT
pip install .
```

```
pip install galore-torch
```

## 🚀 Quick Start

### Get Dataset

```bash
data_download.py
```

### Usage

1. ensure execute permissions

   ```
   chmod +x xxx.sh  #xxx->your file name
   ```

2. Full-Finetuning, LoRA, AdaLoRA, DoRa, PiSSA, rsLoRA, OLoRA, EVA, SIFT

   ```
   # choose the target method_name and modules.
   EfficientFT/sh/roberta-base-peft.sh 
   EfficientFT/sh/llama-peft.sh
   ```

3. Galore.

   ```
   EfficientFT/sh/roberta_galore.sh
   ```

## 😊Some Results

![res1](./figs/res1.jpg)

## 📝 Citation

```bibtex
@article{yao2024theoretical,
  title={Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization},
  author={Yao, Xinhao and Qian, Hongjin and Hu, Xiaolin and Xu, Gengze and Liu, Yong and Liu, Wei and Luan, Jian and Wang, Bin},
  journal={arXiv preprint arXiv:2410.02247},
  year={2024}
}
```