# Youtu-LLM-2B-Base **Repository Path**: hf-models/Youtu-LLM-2B-Base ## Basic Information - **Project Name**: Youtu-LLM-2B-Base - **Description**: Mirror of https://huggingface.co/tencent/Youtu-LLM-2B-Base - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-01 - **Last Updated**: 2026-01-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- library_name: transformers license: other license_link: https://huggingface.co/tencent/Youtu-LLM-2B-Base/LICENSE.txt pipeline_tag: text-generation instruct_model: - tencent/Youtu-LLM-2B ---
# Youtu-LLM Logo [📃 License](LICENSE.txt) • [💻 Code](https://github.com/TencentCloudADP/youtu-tip/tree/master/youtu-llm) • [📑 Technical Report](https://github.com/TencentCloudADP/youtu-tip/blob/master/youtu-llm/assets/Youtu-LLM_Technical_Report.pdf) • [📊 Benchmarks](#benchmarks)
## 🎯 Brief Introduction **Youtu-LLM** is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks. **Youtu-LLM** has the following features: - Type: Autoregressive Causal Language Models with Dense [MLA](https://arxiv.org/abs/2405.04434) - Release versions: [Base](https://huggingface.co/tencent/Youtu-LLM-2B-Base) and [Instruct](https://huggingface.co/tencent/Youtu-LLM-2B) - Number of Parameters: 1.96B - Number of Layers: 32 - Number of Attention Heads (MLA): 16 for Q/K/V - MLA Rank: 1,536 for Q, 512 for K/V - MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V - Context Length: 131,072 - Vocabulary Size: 128,256 ## 📊 Performance Comparisons ### Base Model # Comparison between Youtu-LLM-2B-Base and baselines #### General Benchmarks | Type | Benchmark (Metric) | # Shots | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base | | :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Commonsense | MMLU-Pro (EM) | 5 | 34.9% | 35.3% | 29.4% | 46.1% | 36.2% | **48.4%** | | | MLQA-Zh (EM) | 3 | 38.1% | 38.0% | 40.3% | **47.2%** | 43.0% | 43.5% | | | MMLU-ProX-Zh (EM) | 5 | 32.5% | 26.7% | 24.2% | **45.2%** | 25.4% | 40.7% | | STEM | GSM8K (EM) | 8 | 68.2% | 67.3% | 38.5% | **80.8%** | 47.8% | 77.6% | | | MGSM-Zh (EM) | 8 | 57.1% | 40.7% | 33.0% | **69.7%** | 35.9% | 68.9% | | | MATH (EM) | 4 | 28.1% | 40.8% | 24.4% | **44.8%** | 21.5% | 44.4% | | | BBH (EM) | 3 | 53.0% | 59.8% | 51.6% | **70.8%** | 62.9% | 59.8% | | | GPQA-MC (Acc. Norm) | 5 | 30.4% | 26.6% | 28.6% | **37.8%** | 30.1% | 33.3% | | | HLE-MC (Acc. Norm) | 3 | 10.7% | 3.1% | 8.0% | 15.0% | 11.5% | **17.4%** | | Coding | MBPP (Pass@1) | 3 | 55.6% | 51.0% | 45.8% | **67.5%** | 49.4% | 66.6% | | | MBPP+ (Pass@1) | 3 | 71.0% | 66.1% | 61.9% | 80.8% | 62.7% | **81.8%** | | | HumanEval (Pass@1) | 0 | 49.9% | 34.8% | 36.6% | 57.6% | 36.0% | **64.6%** | | | HumanEval+ (Pass@1) | 0 | 41.3% | 28.1% | 28.1% | 49.9% | 28.1% | **57.3%** | | | LiveCodeBench v6 (Pass@1) | 3 | 5.1% | 2.9% | 2.9% | 6.9% | 3.4% | **9.7%** | | | CRUXEval (Pass@1) | 1 | 40.6% | 42.1% | 39.7% | 54.8% | 42.3% | **55.9%** | | | RepoBench (EM) | 3 | 21.0% | 21.8% | 23.0% | **25.3%** | 25.2% | 22.7% | | Long Context | LongBench v2 (Acc.) | 3 | 28.0% | **28.8%** | 26.6% | 25.8% | 27.8% | 27.2% | | | NIAH (Acc.) | / | 79.8% | 75.0% | 99.5% | 83.0% | **99.8%** | 98.8% | #### Agentic Benchmarks We takes [APTBench](https://github.com/TencentYoutuResearch/APTBench/) for evaluating the agentic capabilities of base model. | Category | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | Code | 25.1% | 24.3% | 32.8% | **41.9%** | 23.6% | 37.9% | | Deep Research | 28.5% | 27.2% | 36.4% | **40.5%** | 30.0% | 38.6% | | Math | 59.9% | 60.7% | 59.8% | **70.5%** | 60.1% | 68.0% | | Tool | 56.7% | 59.1% | 61.7% | **65.8%** | 64.1% | 64.2% |