# ConvLinn **Repository Path**: gidean/conv-linn ## Basic Information - **Project Name**: ConvLinn - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-14 - **Last Updated**: 2025-07-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SLTrain (NeurIPS 2024) A repository containing beta implementation for *SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining*, which has been accepted to NeurIPS 2024. Preprint available on http://arxiv.org/abs/2406.02214. ## Modeling for pretraining The main idea is to re-parameterize linear layer with low-rank and sparse factors for improved parameter and memory efficiency. W = BA + S, where B, A model the low-rank component and S models the sparse component. S has a **random** sparsity pattern. ## Motivation Below, we show how the learned weights L + S enlarges the spectrum. In particular, the L component primarily learns the head singular value spectrum and the S component primarily learns the tail spectrum. Contribution of L and S components in the singular values of learned W

Contribution of L and S components in the singular values of learned W

## Results Result Comparisons

## Installation Build cpp extensions via ```bash cd ./sparse-lora pip install . ``` ## Usage Run the scripts placed in scripts/llm_pretrain/. Typical usage: ```bash torchrun --standalone --nproc_per_node 1 torchrun_main.py \ --model_config configs/llama_60m.json \ --lr 0.003 \ --peft_model sltrain\ --optimizer adamw \ --rank 128 \ --sp_ratio 0.03 \ # sparsity delta --batch_size 256 \ --total_batch_size 512 \ --num_training_steps 11000 \ --warmup_steps 1100 \ --weight_decay 0 \ --dtype bfloat16 \ --eval_every 1000 \ --lora_alpha 32 ``` ## Citation ```bibtex @inproceedings{han2024sltrain, title={{SLTrain}: a sparse plus low-rank approach for parameter and memory efficient pretraining}, author={Han, Andi and Li, Jiaxiang and Huang, Wei and Hong, Mingyi and Takeda, Akiko and Jawanpuria, Pratik and Mishra, Bamdev}, booktitle = {Advances in Neural Information Processing Systems}, volume = {37}, year={2024} } ```