# ConvLinn
**Repository Path**: gidean/conv-linn
## Basic Information
- **Project Name**: ConvLinn
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-14
- **Last Updated**: 2025-07-30
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# SLTrain (NeurIPS 2024)
A repository containing beta implementation for *SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining*, which has been accepted to NeurIPS 2024. Preprint available on http://arxiv.org/abs/2406.02214.
## Modeling for pretraining
The main idea is to re-parameterize linear layer with low-rank and sparse factors for improved parameter and memory efficiency.
W = BA + S,
where B, A model the low-rank component and S models the sparse component. S has a **random** sparsity pattern.
## Motivation
Below, we show how the learned weights L + S enlarges the spectrum. In particular, the L component primarily learns the head singular value spectrum and the S component primarily learns the tail spectrum.
## Results
## Installation
Build cpp extensions via
```bash
cd ./sparse-lora
pip install .
```
## Usage
Run the scripts placed in scripts/llm_pretrain/. Typical usage:
```bash
torchrun --standalone --nproc_per_node 1 torchrun_main.py \
--model_config configs/llama_60m.json \
--lr 0.003 \
--peft_model sltrain\
--optimizer adamw \
--rank 128 \
--sp_ratio 0.03 \ # sparsity delta
--batch_size 256 \
--total_batch_size 512 \
--num_training_steps 11000 \
--warmup_steps 1100 \
--weight_decay 0 \
--dtype bfloat16 \
--eval_every 1000 \
--lora_alpha 32
```
## Citation
```bibtex
@inproceedings{han2024sltrain,
title={{SLTrain}: a sparse plus low-rank approach for parameter and memory efficient pretraining},
author={Han, Andi and Li, Jiaxiang and Huang, Wei and Hong, Mingyi and Takeda, Akiko and Jawanpuria, Pratik and Mishra, Bamdev},
booktitle = {Advances in Neural Information Processing Systems},
volume = {37},
year={2024}
}
```