# pls-loss **Repository Path**: AI-Group/pls-loss ## Basic Information - **Project Name**: pls-loss - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-14 - **Last Updated**: 2026-01-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PLS-Loss: Partial Label Smoothing Loss for Large Language Models ![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg) ![PyTorch](https://img.shields.io/badge/PyTorch-2.4.0-orange.svg) Authors: [Xueming Hou]()\* ## 🔔 NEWS - **[01/18/2026]** Our paper ! ## Table of Contents - [PLS-Loss: Partial Label Smoothing Loss for Large Language Models](#t6-tensor-product-attention-transformer) - [Table of Contents](#table-of-contents) - [Features](#features) - [Hardware Requirements](#hardware-requirements) - [Installation](#installation) - [Data Preparation](#data-preparation) - [Fineweb-Edu-100B](#fineweb-edu-100b) - [Pretraining](#pretraining) - [Evaluation](#evaluation) - [Acknowledgements](#acknowledgements) - [Star History](#star-history) - [Citation](#citation) ## Features - **PLS-Loss:** Implements partial label smoothing loss. ## Hardware Requirements A100 and H100 are recommended. At least 8*80G VRAM is needed. ## Installation Ensure you have Python 3.10 or higher installed. It's recommended to use a virtual environment to manage dependencies. 1. **Clone the Repository** ```bash git clone https://github.com/tensorgi/TPA.git cd TPA ``` 2. **Create and Activate a Virtual Environment** ```bash python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install Required Packages** ```bash pip install torch==2.4.0 numpy transformers datasets tiktoken wandb tqdm ``` ## Data Preparation Prepare the necessary datasets before pretraining the model. Support [Fineweb-Edu-100B](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu/) ### Fineweb-Edu-100B Fineweb-Edu-100B is a large-scale educational dataset hosted on Hugging Face. 1. **Navigate to the Data Directory** ```bash cd data/fineweb-edu ``` 2. **Run the Data Preparation Script** ```bash python fineweb-edu.py ``` 3. **Move the Prepared Data** ```bash mv fineweb-edu100B .. cd ../.. ``` 4. **Gen ngram model** ```bash cp data/fineweb-edu/read_tokens_train.py /path/to/kenlm/build/ python read_tokens_train.py | ./bin/lmplz -o 3 -S 50% -T /data/tmp > 100Bo3.arpa ``` 5. **Gen cache model** ```bash python arpa2binary_partial.py --arpa 100Bo3.arpa --binary 100Bo3.bin.pt --partial 100000000 python query.py --lm_path 100Bo3.bin.pt --cache_path 100Bo3.cache --mode fast --shard_size 2000000 --max_cands 1000 ``` max_cands==100时,100Bo2.bin.pt竟然也没有减少结果的存储大小!!! 当前100Bo3.bin.pt在读取到第17个分片后,内存基本被占满,所以程序会自动卡住,可能接下来需要分桶保存。比如以2位尾数分100桶,先逐步按10个分片计算并分桶保存,然后将所有分桶数据合并到一起。 ## Pretraining Pretrain the model using the prepared datasets. The provided scripts support distributed training across multiple GPUs. 1. **Baseline** For more control or customization, use `torchrun` to initiate training. Replace `config/train_llama_small_adam_80g8.py` with your desired configuration file. ```bash torchrun --standalone --nproc_per_node=8 \ train_fw.py \ config/train_llama_small_adam_80g8.py ``` - `--nproc_per_node=8` specifies the number of processes (typically matching the number of GPUs). 2. **PLS-Loss** Update `train_fw_pls.py` with correct path of `cache_model` ```bash torchrun --standalone --nproc_per_node=8 \ train_fw_pls.py \ config/train_llama_small_adam_80g8_pls.py ``` ## Evaluation Evaluate the performance of the pretrained model using standardized benchmarks. 1. **Navigate to the Evaluation Harness Directory** ```bash cd lm-evaluation-harness ``` 2. **Follow the Instructions Within This Directory** *Ensure your model is compatible with the evaluation harness requirements.* ## Acknowledgements - [Karpathy’s nanoGPT](https://github.com/karpathy/nanoGPT) provides the foundational codebase upon which this repo is built. - [Hugging Face](https://huggingface.co/) for providing the [Fineweb-Edu-100B](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu/) dataset. - [EleutherAI](https://www.eleuther.ai/) for the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). - [tensorgi/TPA](https://github.com/tensorgi/TPA) provides the foundational codebase upon which this repo is built. ## Star History [![Star History Chart](https://api.star-history.com/svg?repos=bigcash/PLS-Loss&type=Date)](https://star-history.com/#bigcash/PLS-Loss&Date) ## Citation If you use PLS-Loss in your research or application, please consider citing it! ```bibtex @article{hou2026pls-loss, title={PLS-Loss: Partial Label Smoothing Loss for Large Language Models}, author={Xueming Hou}, journal={arXiv preprint arXiv:}, year={2026}, } ```