MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
This paper combines State Space Modeling (SSM) with the Mixture of Experts (MoE) approach, and introduces the MoE-Mamba model in which every other Mamba layer is replaced with a MoE feed-forward layer based on the Switch transformer. MoE-Mamba is shown to not only outperform both vanilla Mamba and Transformer-MoE but also achieve the same performance as Mamba in ~2.5x fewer training steps.
While the SSM-MoE hybrid is a new area of research, it does offer a practical option to scale SSMs to billions of parameters.
WARM: On the Benefits of Weight-Averaged Reward Models
Reward hacking is a potential risk when LLMs are aligned with human preferences through reinforcement learning. The models can exploit failures in the reward model to achieve seemingly high rewards without meeting the objectives. This DeepMind paper highlights two primary challenges when designing reward models to mitigate the risk of reward hacking: (i) distribution shifts during the reinforcement learning process, and (ii) inconsistencies in human preferences. The researchers further propose a solution called WARM (Weight-Averaged Reward Models) in which multiple reward models are fine-tuned, and subsequently averaged in the weight space.
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
This paper provides a thorough overview of leveraging LLMs for NLG (natural language generation) evaluation, and proposes a taxonomy for organizing existing LLM-based evaluation metrics. The NLG coverage is considerably wide – data-to-text, dialogue generation, image captioning, machine translation, story generation, text summarization, and general generation.
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Anthropic researchers showed how LLMs can be trained to be malicious through deceptive/backdoor techniques. More importantly, this backdoor behavior can be made persistent, particularly in very large models, and those involving chain-of-thought reasoning. Standard LLM safety techniques (e.g., adversarial training, supervised fine-tuning, etc.) often fail to identify or remove such deceptive behavior. Moreover, instead of removing backdoors, adversarial training may train models to better recognize backdoor triggers, thereby hiding unsafe behavior, and increasing the overall security risk.
Tuning Language Models by Proxy
The researchers propose a new fine-tuning approach (called Proxy Tuning) in which LLMs are tuned at decoding-time by modifying the output logits. Since this approach works by accessing the LLM predictions (and not the parameters), it can be used even in cases where the LLM weights are private. Proxy tuning operates in the following manner:
- a smaller version of the target LLM is tuned
- the difference between the predictions of the smaller-tuned model, and the smaller-untuned model is computed
- this difference is applied to shift the predictions of the original LLM in the direction of the tuning
Knowledge Fusion of Large Language Models
In theory, combining several existing LLMs to create powerful models is a cost-effective approach to LLM development. However, the varying architectures of different LLMs create practical limitations (e.g., the inability to directly blend the weights of multiple LLMs) in implementing this approach. This paper proposes FuseLLM, the concept of knowledge fusion by leveraging the generative distributions of source LLMs to externalize their knowledge.
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
The paper introduces a test-based, multi-stage code generation tool (called AlphaCodium) that improves the performances of LLMs on coding tasks. It operates as a code-oriented flow that runs iteratively, and fixes generated code against input-output tests. The flow comprises two key phases:
- pre-processing phase: AlphaCodium reasons about the problem in natural language
- code iterations phase: AlphaCodium iterates on public and AI-generated tests.
Self-Rewarding Language Models
In this paper from Meta and NYU, the researchers explore language models that provide their own rewards during training (through (LLM-as-a-Judge prompting.) This self-alignment method consists of two phases – (i)
- Self-Instruction creation – a seed model generates candidate responses from new prompts, and also predicts its own rewards
- Instruction following training: preference pairs are selected from the generated data, and used for training via Direct Preference Optimization – this leads to the new (updated) version of the model.
Other Notable Papers:
- Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM: https://arxiv.org/abs/2401.02994
- GPT-4V(ision) is a Generalist Web Agent, if Grounded: https://arxiv.org/abs/2401.01614
- Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models: https://arxiv.org/abs/2401.06102
- REFT: Reasoning with REinforced Fine-Tuning: https://arxiv.org/abs/2401.08967
- TrustLLM: Trustworthiness in Large Language Models: https://arxiv.org/abs/2401.05561