# awesome-ssm **Repository Path**: hazdzz/awesome-ssm ## Basic Information - **Project Name**: awesome-ssm - **Description**: A list for SSMs and related works. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-03-09 - **Last Updated**: 2024-07-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SSMs and related works list [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) [![forks](https://img.shields.io/github/forks/hazdzz/awesome-ssm)](https://github.com/hazdzz/awesome-ssm/network/members) [![stars](https://img.shields.io/github/stars/hazdzz/awesome-ssm)](https://github.com/hazdzz/awesome-ssm/stargazers) [![License](https://img.shields.io/github/license/hazdzz/awesome-ssm)](./LICENSE) ## About A list for SSMs and related works. ## List for SSMs | Number | SSM | Paper | Code | Conference or Journal | URL | |:------:|:--------------------------:|-------|------|:-------:|----------------------------------------| | 1 | HiPPO | HiPPO: Recurrent Memory with Optimal Polynomial Projections | https://github.com/state-spaces/s4 | NeurIPS 2020 | https://proceedings.neurips.cc/paper/2020/hash/102f0bb6efb3a6128a3c750dd16729be-Abstract.html | | 2 | LSSL | Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers | https://github.com/state-spaces/s4 | NeurIPS 2021 | https://openreview.net/forum?id=yWd42CWN3c | | 3 | S4 | Efficiently Modeling Long Sequences with Structured State Spaces | https://github.com/state-spaces/s4 | ICLR 2022 | https://openreview.net/forum?id=uYLFoz1vlAC | | 4 | DSS | Diagonal State Spaces are as Effective as Structured State Spaces | https://github.com/ag1988/dss | NeurIPS 2022 | https://openreview.net/forum?id=RjS0j6tsSrf | | 5 | S4D | On the Parameterization and Initialization of Diagonal State Space Models | https://github.com/state-spaces/s4 | NeurIPS 2022 | https://openreview.net/forum?id=yJE7iQSAep | | 6 | Generalized HiPPO | How to Train your HIPPO: State Space Models with Generalized Orthogonal Basis Projections | https://github.com/state-spaces/s4 | ICLR 2023 | https://openreview.net/forum?id=klK17OQ3KB | | 7 | GSS | Long Range Language Modeling via Gated State Spaces | | ICLR 2023 | https://openreview.net/forum?id=5MkYIYCbva | | 8 | Liquid S4 | Liquid Structural State-Space Models | https://github.com/raminmh/liquid-s4 | ICLR 2023 | https://openreview.net/forum?id=g4OTKRKfS7R | | 9 | S5 | Simplified State Space Layers for Sequence Modeling | https://github.com/lindermanlab/S5 | ICLR 2023 | https://openreview.net/forum?id=Ai8Hw3AXqks | | 10 | H3 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models | https://github.com/HazyResearch/H3 | ICLR 2023 | https://openreview.net/forum?id=COZDy0WYGg | | 11 | S4-PTD and S5-PTD | Robustifying State-space Models for Long Sequences via Approximate Diagonalization | | ICLR 2024 | https://openreview.net/forum?id=DjeQ39QoLQ | | 12 | S6 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | https://github.com/state-spaces/mamba | | https://arxiv.org/abs/2312.00752 | | 13 | STU | Spectral State Space Models | https://github.com/catid/spectral_ssm | | https://arxiv.org/abs/2312.06837 | | 14 | Mamba 2 | Transformers are SSMs: Generalized Models and Efficient Algorithms with Structured State Space Duality | https://github.com/state-spaces/mamba | ICML 2024 | https://arxiv.org/abs/2405.21060 | ## List for Linear RNNs (LRNNs) | Number | LRNN | Paper | Code | Conference or Journal | URL | |:------:|:--------------------------:|-------|------|:-------:|----------------------------------------| | 1 | CKConv | CKConv: Continuous Kernel Convolution For Sequential Data | https://github.com/dwromero/ckconv | ICLR 2021 | https://openreview.net/forum?id=8FhxBtXSl0 | | 2 | FlexConv | FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes | https://github.com/rjbruin/flexconv | ICLR 2022 | https://openreview.net/forum?id=3jooF27-0Wy | | 3 | DLR | Simplifying and Understanding State Space Models with Diagonal Linear RNNs | https://github.com/ag1988/dlr | | https://arxiv.org/abs/2212.00768 | | 4 | CCNN | Modelling Long Range Dependencies in \$N\$D: From Task-Specific to a General Purpose CNN | https://github.com/david-knigge/ccnn | ICLR 2023 | https://openreview.net/forum?id=ZW5aK4yCRqU | | 5 | SGConv | What Makes Convolutional Models Great on Long Sequence Modeling? | https://github.com/ctlllll/SGConv | ICLR 2023 | https://openreview.net/forum?id=TGJSPbRpJX- | | 6 | Mega | Mega: Moving Average Equipped Gated Attention | https://github.com/facebookresearch/mega | ICLR 2023 | https://openreview.net/forum?id=qNLe3iq2El | | 7 | TNN | Toeplitz Neural Network for Sequence Modeling | https://github.com/Doraemonzzz/tnn-pytorch | ICLR 2023 | https://openreview.net/forum?id=IxmWsm4xrua | | 8 | Hyena | Hyena Hierarchy: Towards Larger Convolutional Language Models | https://github.com/hazyresearch/safari | ICML 2023 | https://proceedings.mlr.press/v202/poli23a.html | | 9 | MultiresNet | Sequence Modeling with Multiresolution Convolutional Memory | https://github.com/thjashin/multires-conv | ICML 2023 | https://proceedings.mlr.press/v202/shi23f.html | | 10 | LRU | Resurrecting Recurrent Neural Networks for Long Sequences | | ICML 2023 | https://proceedings.mlr.press/v202/orvieto23a.html | | 11 | RWKV v4 (Dove) | RWKV: Reinventing RNNs for the Transformer Era | https://github.com/BlinkDL/RWKV-LM | EMNLP 2023 | https://aclanthology.org/2023.findings-emnlp.936/ | | 12 | RetNet | Retentive Network: A Successor to Transformer for Large Language Models | https://github.com/microsoft/torchscale | | https://arxiv.org/abs/2307.08621 | | 13 | MultiHyena | Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions | | NeurIPS 2023 | https://openreview.net/forum?id=OWELckerm6 | | 14 | Monarch Mixer | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture | https://github.com/HazyResearch/m2 | NeurIPS 2023 | https://openreview.net/forum?id=cB0BImqSS9 | | 15 | SeqBoat | Sparse Modular Activation for Efficient Sequence Modeling | https://github.com/renll/SeqBoat | NeurIPS 2023 | https://openreview.net/forum?id=TfbzX6I14i | | 16 | HGRN | Hierarchically Gated Recurrent Neural Network for Sequence Modeling | https://github.com/OpenNLPLab/HGRN | NeurIPS 2023 | https://openreview.net/forum?id=P1TCHxJwLB | | 17 | GLA Transformer | Gated Linear Attention Transformers with Hardware-Efficient Training | https://github.com/sustcsonglin/flash-linear-attention | | https://arxiv.org/abs/2312.06635 | | 18 | Orchid | Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling | | | https://arxiv.org/abs/2402.18508 | | 19 | RWKV v5 (Eagle) and v6 (Finch) | Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence | https://huggingface.co/RWKV | | https://arxiv.org/abs/2404.05892 | | 20 | HGRN2 | HGRN2: Gated Linear RNNs with State Expansion | https://github.com/OpenNLPLab/HGRN2 | | https://arxiv.org/abs/2404.07904 | ## List for Surveys | Number | Paper | Journal or Conference | URL | |:------:|--------------------------|:-------:|----------------------------------------| | 1 | A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies | | https://arxiv.org/abs/2302.06218 | | 2 | State Space Model for New-Generation Network Alternative to Transformers: A Survey | | https://arxiv.org/abs/2404.09516 |