# SimCIT **Repository Path**: alibaba/SimCIT ## Basic Information - **Project Name**: SimCIT - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-15 - **Last Updated**: 2025-12-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation View Demo · Report Bug · Request Feature
## About The Project [![Product Name Screen Shot][product-screenshot]](https://github.com/alibaba/SimCIT) Generative retrieval has emerged as a promising paradigm aiming at directly generating the identifiers of target candidates. However, in large-scale retrieval systems, this approach becomes increasingly cumbersome due to the redundancy and sheer scale of the token space. Existing solutions often rely on reconstruction-based quantization strategies, such as RQ-VAE, to reduce embedding size. While effective for independent item embedding reconstruction, this conflicts with the core objective of generative retrieval tasks, which demand robust differentiation among items. Furthermore, the effective integration of multi-modal side information—including descriptive text, images, and geographical knowledge—into existing generative retrieval frameworks remains a significant challenge. To address these limitations and specifically enhance the scalability of generative retrieval, we propose SimCIT, a Simple Contrastive learning framework of Item Tokenization. Specifically, different from existing reconstruction-based strategies, SimCIT propose to use a learnable residual quantization module to align with the signals from different modalities of the items, which combines multi-modal knowledge alignment and semantic tokenization in a mutually beneficial contrastive learning framework. Extensive experiments across public datasets from various domains and a large-scale location-based industrial dataset of AMAP1 App demonstrate SimCIT’s effectiveness and scalability in LLM-based generative retrieval tasks. ## Getting Started This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps. ### Prerequisites This is an example of how to list things you need to use the software and how to install them. 1. Clone the repository: ```bash git clone repo_name cd repo ``` 2. Install dependencies: * Python 3.9+ * PyTorch 2.2.0 * requirements.txt ```sh pip install -r requirements.txt ``` ### SID Generation 1. Training the Model To start distributed training, use the following command: ``` ./run_train.sh ``` 2. Parameters - `--state_dict_save_path`: Directory for model outputs. 3. Testing the Model Use the following command to start testing: ```bash ./run_infer.sh ``` ## License Distributed under the project_license. See `LICENSE` for more information. ## Contact If you have any questions or encounter difficulties, we welcome you to contact ours via [GitHub Issues](https://github.com/alibaba/SimCIT/issues). We are dedicated to supporting you in resolving issues related to sid generation, facilitating a robust and efficient setup for your system. ## Citing this work Please cite the following paper if you find our code helpful. ``` @misc{zhai2025simple, title={A Simple Contrastive Framework Of Item Tokenization For Generative Recommendation}, author={Zhai, Penglong and Yuan, Yifang and Di, Fanyi and Li, Jie and Liu, Yue and Li, Chen and Huang, Jie and Wang, Sicong and Xu, Yao and Li, Xin}, year={2025}, eprint={2506.16683}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2506.16683}, } ``` [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555 [linkedin-url]: https://linkedin.com/in/linkedin_username [product-screenshot]: asset/SimCIT.png [Next.js]: https://img.shields.io/badge/next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white [Next-url]: https://nextjs.org/ [React-url]: https://reactjs.org/ [Vue.js]: https://img.shields.io/badge/Vue.js-35495E?style=for-the-badge&logo=vuedotjs&logoColor=4FC08D [Vue-url]: https://vuejs.org/ [Angular.io]: https://img.shields.io/badge/Angular-DD0031?style=for-the-badge&logo=angular&logoColor=white [Angular-url]: https://angular.io/ [Svelte.dev]: https://img.shields.io/badge/Svelte-4A4A55?style=for-the-badge&logo=svelte&logoColor=FF3E00 [Svelte-url]: https://svelte.dev/ [Laravel.com]: https://img.shields.io/badge/Laravel-FF2D20?style=for-the-badge&logo=laravel&logoColor=white [Laravel-url]: https://laravel.com [Bootstrap.com]: https://img.shields.io/badge/Bootstrap-563D7C?style=for-the-badge&logo=bootstrap&logoColor=white [Bootstrap-url]: https://getbootstrap.com [JQuery.com]: https://img.shields.io/badge/jQuery-0769AD?style=for-the-badge&logo=jquery&logoColor=white [JQuery-url]: https://jquery.com