- Gradients are Not All You Need: https://arxiv.org/pdf/2111.05803v1.pdf
- RAVE: A variational autoencoder for fast and high-quality neural audio synthesis: https://arxiv.org/pdf/2111.05011v1.pdf
- NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework: https://arxiv.org/pdf/2111.04130v1.pdf
- A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis: https://arxiv.org/pdf/2110.15678v2.pdf
- On Representation Knowledge Distillation for Graph Neural Networks: https://arxiv.org/pdf/2111.04964v1.pdf
- Meta-Learning to Improve Pre-Training: https://arxiv.org/pdf/2111.01754v1.pdf
- Federated Learning Based on Dynamic Regularization: https://arxiv.org/pdf/2111.04263v2.pdf
- Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues: https://arxiv.org/pdf/2111.02574v1.pdf
- Relational Self-Attention: What’s Missing in Attention for Video Understanding: https://arxiv.org/pdf/2111.01673v1.pdf
- An Empirical Study of Training End-to-End Vision-and-Language Transformers: https://arxiv.org/pdf/2111.02387v1.pdf
- Graph Tree Memory Networks: https://arxiv.org/pdf/2111.02353v1.pdf
- NormFormer: Improved Transformer Pretraining with Extra Normalization: https://arxiv.org/pdf/2110.09456v2.pdf
- Context-Aware Transformer Transducer for Speech Recognition: https://arxiv.org/pdf/2111.03250v1.pdf
- Attention Approximates Sparse Distributed Memory: https://arxiv.org/pdf/2111.05498v1.pdf
- A Survey of Visual Transformers: https://arxiv.org/pdf/2111.06091v2.pdf
- Are Transformers More Robust Than CNNs?: https://arxiv.org/pdf/2111.05464v1.pdf
- Data Augmentation Can Improve Robustness: https://arxiv.org/pdf/2111.05328v1.pdf
- Attention Mechanisms in Computer Vision: A Survey: https://arxiv.org/pdf/2111.07624v1.pdf
- The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos: https://arxiv.org/pdf/2111.06394v1.pdf
- Advances in Neural Rendering: https://arxiv.org/pdf/2111.05849v1.pdf
- Kalman Filtering with Adversarial Corruptions: https://arxiv.org/pdf/2111.06395v1.pdf
- Learning to ignore: rethinking attention in CNNs: https://arxiv.org/pdf/2111.05684v1.pdf
- Neural optimal feedback control with local learning rules: https://arxiv.org/pdf/2111.06920v1.pdf
- Learning Signal-Agnostic Manifolds of Neural Fields: https://arxiv.org/pdf/2111.06387v1.pdf
- Learning from Mistakes – A Framework for Neural Architecture Search: https://arxiv.org/pdf/2111.06353v1.pdf
- Edge-Cloud Polarization and Collaboration: A Comprehensive Survey: https://arxiv.org/pdf/2111.06061v2.pdf