# Engram **Repository Path**: supperman_009/Engram ## Basic Information - **Project Name**: Engram - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-13 - **Last Updated**: 2026-01-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

## 1. Introduction This repository contains the official implementation for the paper: **[Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models](Engram_paper.pdf)**. > **Abstract:** While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup. To address this, we explore **conditional memory** as a complementary sparsity axis, instantiated via **Engram**, a module that modernizes classic $N$-gram embeddings for $\mathcal{O}(1)$ lookup. **Key Contributions:** - **Sparsity Allocation:** We formulate the trade-off between neural computation (MoE) and static memory (Engram), identifying a U-shaped scaling law that guides optimal capacity allocation. - **Empirical Verification:** Under strict iso-parameter and iso-FLOPs constraints, the Engram-27B model demonstrates consistent improvements over MoE baselines across knowledge, reasoning, code and math domains. - **Mechanistic Analysis:** Our analysis suggests that Engram relieves early layers from static pattern reconstruction, potentially preserving effective depth for complex reasoning. - **System Efficiency:** The module employs deterministic addressing, enabling the offloading of massive embedding tables to host memory with minimal inference overhead. ## 2. Architecture The Engram module augments the backbone by retrieving static $N$-gram memory and fusing it with dynamic hidden states. The architecture is shown below ([drawio provided](drawio/Engram.drawio)):

Engram Architecture

## 3. Evaluation ### Scaling Law

Scaling Law

--- ### Large Scale Pre-training

Pre-training Results

--- ### Long-context Training

Long Context Results

## 4. Case Study of Engram

Long Context Results

## 5. Quick Start We recommend using Python 3.8+ and PyTorch. ```bash pip install torch numpy transformers sympy ``` We provide a standalone implementation to demonstrate the core logic of the Engram module: ```bash python engram_demo_v1.py ``` > ⚠️ **Note:** The provided code is a demonstration version intended to illustrate the data flow. It mocks standard components (like Attention/MoE/mHC) to focus on the Engram module. ## 6. License The use of Engram models is subject to [the Model License](LICENSE). ## 7. Contact If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).