# awesome-ssm

**Repository Path**: hazdzz/awesome-ssm

## Basic Information

- **Project Name**: awesome-ssm
- **Description**: A list for SSMs and related works.
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-03-09
- **Last Updated**: 2024-07-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# SSMs and related works list
[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
[![forks](https://img.shields.io/github/forks/hazdzz/awesome-ssm)](https://github.com/hazdzz/awesome-ssm/network/members)
[![stars](https://img.shields.io/github/stars/hazdzz/awesome-ssm)](https://github.com/hazdzz/awesome-ssm/stargazers)
[![License](https://img.shields.io/github/license/hazdzz/awesome-ssm)](./LICENSE)

## About
A list for SSMs and related works.

## List for SSMs
| Number | SSM | Paper | Code | Conference or Journal | URL |
|:------:|:--------------------------:|-------|------|:-------:|----------------------------------------|
| 1 | HiPPO | HiPPO: Recurrent Memory with Optimal Polynomial Projections | https://github.com/state-spaces/s4 | NeurIPS 2020 | https://proceedings.neurips.cc/paper/2020/hash/102f0bb6efb3a6128a3c750dd16729be-Abstract.html |
| 2 | LSSL | Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers | https://github.com/state-spaces/s4 | NeurIPS 2021 | https://openreview.net/forum?id=yWd42CWN3c |
| 3 | S4 | Efficiently Modeling Long Sequences with Structured State Spaces | https://github.com/state-spaces/s4 | ICLR 2022 | https://openreview.net/forum?id=uYLFoz1vlAC |
| 4 | DSS | Diagonal State Spaces are as Effective as Structured State Spaces | https://github.com/ag1988/dss | NeurIPS 2022 | https://openreview.net/forum?id=RjS0j6tsSrf |
| 5 | S4D | On the Parameterization and Initialization of Diagonal State Space Models | https://github.com/state-spaces/s4 | NeurIPS 2022 | https://openreview.net/forum?id=yJE7iQSAep |
| 6 | Generalized HiPPO | How to Train your HIPPO: State Space Models with Generalized Orthogonal Basis Projections | https://github.com/state-spaces/s4 | ICLR 2023 | https://openreview.net/forum?id=klK17OQ3KB |
| 7 | GSS | Long Range Language Modeling via Gated State Spaces | | ICLR 2023 | https://openreview.net/forum?id=5MkYIYCbva |
| 8 | Liquid S4 | Liquid Structural State-Space Models | https://github.com/raminmh/liquid-s4 | ICLR 2023 | https://openreview.net/forum?id=g4OTKRKfS7R |
| 9 | S5 | Simplified State Space Layers for Sequence Modeling | https://github.com/lindermanlab/S5 | ICLR 2023 | https://openreview.net/forum?id=Ai8Hw3AXqks |
| 10 | H3 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | https://github.com/HazyResearch/H3 | ICLR 2023 | https://openreview.net/forum?id=COZDy0WYGg |
| 11 | S4-PTD and S5-PTD | Robustifying State-space Models for Long Sequences via Approximate Diagonalization | | ICLR 2024 | https://openreview.net/forum?id=DjeQ39QoLQ |
| 12 | S6 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | https://github.com/state-spaces/mamba | | https://arxiv.org/abs/2312.00752 |
| 13 | STU | Spectral State Space Models | https://github.com/catid/spectral_ssm | | https://arxiv.org/abs/2312.06837 |
| 14 | Mamba 2 | Transformers are SSMs: Generalized Models and Efficient Algorithms with Structured State Space Duality | https://github.com/state-spaces/mamba | ICML 2024 | https://arxiv.org/abs/2405.21060 |

## List for Linear RNNs (LRNNs)
| Number | LRNN | Paper | Code | Conference or Journal | URL |
|:------:|:--------------------------:|-------|------|:-------:|----------------------------------------|
| 1 | CKConv | CKConv: Continuous Kernel Convolution For Sequential Data | https://github.com/dwromero/ckconv | ICLR 2021 | https://openreview.net/forum?id=8FhxBtXSl0 |
| 2 | FlexConv | FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes | https://github.com/rjbruin/flexconv | ICLR 2022 | https://openreview.net/forum?id=3jooF27-0Wy |
| 3 | DLR | Simplifying and Understanding State Space Models with Diagonal Linear RNNs | https://github.com/ag1988/dlr | | https://arxiv.org/abs/2212.00768 |
| 4 | CCNN | Modelling Long Range Dependencies in \$N\$D: From Task-Specific to a General Purpose CNN | https://github.com/david-knigge/ccnn | ICLR 2023 | https://openreview.net/forum?id=ZW5aK4yCRqU |
| 5 | SGConv | What Makes Convolutional Models Great on Long Sequence Modeling? | https://github.com/ctlllll/SGConv | ICLR 2023 | https://openreview.net/forum?id=TGJSPbRpJX- |
| 6 | Mega | Mega: Moving Average Equipped Gated Attention | https://github.com/facebookresearch/mega | ICLR 2023 | https://openreview.net/forum?id=qNLe3iq2El |
| 7 | TNN | Toeplitz Neural Network for Sequence Modeling | https://github.com/Doraemonzzz/tnn-pytorch | ICLR 2023 | https://openreview.net/forum?id=IxmWsm4xrua |
| 8 | Hyena | Hyena Hierarchy: Towards Larger Convolutional Language Models | https://github.com/hazyresearch/safari | ICML 2023 | https://proceedings.mlr.press/v202/poli23a.html |
| 9 | MultiresNet | Sequence Modeling with Multiresolution Convolutional Memory | https://github.com/thjashin/multires-conv | ICML 2023 | https://proceedings.mlr.press/v202/shi23f.html |
| 10 | LRU | Resurrecting Recurrent Neural Networks for Long Sequences | | ICML 2023 | https://proceedings.mlr.press/v202/orvieto23a.html |
| 11 | RWKV v4 (Dove) | RWKV: Reinventing RNNs for the Transformer Era | https://github.com/BlinkDL/RWKV-LM | EMNLP 2023 | https://aclanthology.org/2023.findings-emnlp.936/ |
| 12 | RetNet | Retentive Network: A Successor to Transformer for Large Language Models | https://github.com/microsoft/torchscale | | https://arxiv.org/abs/2307.08621 |
| 13 | MultiHyena | Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions | | NeurIPS 2023 | https://openreview.net/forum?id=OWELckerm6 |
| 14 | Monarch Mixer | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture | https://github.com/HazyResearch/m2 | NeurIPS 2023 | https://openreview.net/forum?id=cB0BImqSS9 |
| 15 | SeqBoat | Sparse Modular Activation for Efficient Sequence Modeling | https://github.com/renll/SeqBoat | NeurIPS 2023 | https://openreview.net/forum?id=TfbzX6I14i |
| 16 | HGRN | Hierarchically Gated Recurrent Neural Network for Sequence Modeling | https://github.com/OpenNLPLab/HGRN | NeurIPS 2023 | https://openreview.net/forum?id=P1TCHxJwLB |
| 17 | GLA Transformer | Gated Linear Attention Transformers with Hardware-Efficient Training | https://github.com/sustcsonglin/flash-linear-attention | | https://arxiv.org/abs/2312.06635 |
| 18 | Orchid | Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling | | | https://arxiv.org/abs/2402.18508 |
| 19 | RWKV v5 (Eagle) and v6 (Finch) | Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence | https://huggingface.co/RWKV | | https://arxiv.org/abs/2404.05892 |
| 20 | HGRN2 | HGRN2: Gated Linear RNNs with State Expansion | https://github.com/OpenNLPLab/HGRN2 | | https://arxiv.org/abs/2404.07904 |

## List for Surveys
| Number | Paper | Journal or Conference | URL |
|:------:|--------------------------|:-------:|----------------------------------------|
| 1 | A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies | | https://arxiv.org/abs/2302.06218 |
| 2 | State Space Model for New-Generation Network Alternative to Transformers: A Survey | | https://arxiv.org/abs/2404.09516 |