# SkyThought-dev
**Repository Path**: dz-cloudlearning/SkyThought-dev
## Basic Information
- **Project Name**: SkyThought-dev
- **Description**: https://github.com/novasky-ai/SkyThought 开发分支
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2025-01-12
- **Last Updated**: 2025-01-12
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# SkyThought
[](https://github.com/NovaSky-AI/SkyThought) [](https://huggingface.co/NovaSky-AI) [](https://x.com/NovaSkyAI)
# News
- **[2025/01/10]** 🎉 We have released our Sky-T1-32B-Preview [model](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) and [data](https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k) through [HuggingFace](https://huggingface.co/NovaSky-AI)!
# Links
- 📜 [Sky-T1-32B-Preview model Blog Post](https://novasky-ai.github.io/posts/sky-t1/)
- 🤗 [Sky-T1-32B-Preview model](https://huggingface.co/NovaSky-AI)
# Getting Started
We open source the code and scripts we used for data curation, training, and evaluation for Sky-T1-32B-Preview, you can find more details in each directory.
- ``/data``: The 17k training data used to train Sky-T1-32B-Preview. We also add the science and riddle portion from the [STILL-2 model](https://arxiv.org/pdf/2412.09413).
- ``skythought/tools``: Training data curation and evaluation for Sky-T1. To generate our training data, we use the QwQ-32B-Preview model. We curate the data mixture to cover diverse domains that require reasoning, and a reject sampling procedure to improve the data quality.
- ``skythought/train``: Training scripts for Sky-T1. We use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) to perform training. The model was trained for 3 epochs with a learning rate of 1e-5 and a batch size of 96. Our model training was completed in 19 hours on 8 H100 GPUs using DeepSpeed Zero-3 offloading, costing approximately $450 as per Lambda Cloud pricing.
# Evaluation
Following, we show our evaluation results for the Sky-T1-32B-Preview model across math, coding, and science benchmarks.
### Evaluation results
| Metric | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ | o1-preview |
|-----------------------|---------------------|--------|-------|------------|
| Math500 | 82.4 | 76.2 | 85.4 | 81.4 |
| AIME2024 | 43.3 | 16.7 | 50.0 | 40.0 |
| LiveCodeBench-Easy | 86.3 | 84.6 | 90.7 | 92.9 |
| LiveCodeBench-Medium | 56.8 | 40.8 | 56.3 | 54.9 |
| LiveCodeBench-Hard | 17.9 | 9.8 | 17.1 | 16.3 |
| GPQA-Diamond | 56.8 | 45.5 | 52.5 | 75.2 |
## Fully Open-source: Driving Progress Together
We believe that open-source collaboration drives progress, and with Sky-T1-32B-Preview, we are fully committed to empowering the community. We open-source all details (i.e., data, codes, model weights) to enable the community to replicate and improve on our results *easily*:
| Model |
Sky-T1-32B-Preview |
STILL-2 |
Journey |
QwQ |
o1 |
| Data |
✅ |
✅ |
❌ |
❌ |
❌ |
| Code |
✅ |
❌ |
❌ |
❌ |
❌ |
| Report |
✅ |
✅ |
✅ |
❌ |
❌ |
| Math domain |
✅ |
✅ |
✅ |
✅ |
✅ |
| Coding domain |
✅ |
❌ |
❌ |
✅ |
✅ |
| Model Weights |
✅ |
✅ |
❌ |
✅ |
❌ |
# Citation
The code in this repository is mostly described in the post below. Please consider citing this work if you find the repository helpful.
```bibtex
@misc{sky_t1_2025,
author = {NovaSky Team},
title = {Sky-T1: Train your own O1 preview model within $450},
howpublished = {https://novasky-ai.github.io/posts/sky-t1},
note = {Accessed: 2025-01-09},
year = {2025}
}
```
# Acknowledgement
This work is done at [Berkeley Sky Computing Lab](https://sky.cs.berkeley.edu/), with the amazing compute support from [Lambda Labs](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [Anyscale](https://www.anyscale.com/). We would like to express our gratitude for the valuable academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and Junyang Lin from the [Qwen Team](https://qwenlm.github.io/).