AI Research & Innovation in 2024, Vol. 2
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts This paper combines State Space Modeling (SSM) with the Mixture of Experts (MoE) approach, and introduces the MoE-Mamba model in which every other Mamba layer is replaced with a MoE feed-forward layer based on the