- Simulation Intelligence: Towards A New Generation Of Scientific Methods: https://arxiv.org/pdf/2112.03235v1.pdf
- Information is Power: Intrinsic Control via Information Capture: https://arxiv.org/pdf/2112.03899v1.pdf
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts: https://arxiv.org/pdf/2112.06905v1.pdf
- Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning: https://arxiv.org/pdf/2112.03763v1.pdf
- Efficient Geometry-aware 3D Generative Adversarial Networks: https://arxiv.org/pdf/2112.07945v1.pdf
- GAN-Supervised Dense Visual Alignment: https://arxiv.org/pdf/2112.05143v1.pdf
- BEVT: BERT Pretraining of Video Transformers: https://arxiv.org/pdf/2112.01529v1.pdf
- Optimal Latent Space Forecasting For Large Collections Of Short Time Series Using Temporal Matrix Factorization: https://arxiv.org/pdf/2112.08052v1.pdf
- Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks: https://arxiv.org/pdf/2112.01522v1.pdf
- Measure and Improve Robustness in NLP Models: A Survey: https://arxiv.org/pdf/2112.08313v1.pdf
- Self-Attention Does Not Need O(N2) Memory: https://arxiv.org/pdf/2112.05682v2.pdf
- Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework: https://arxiv.org/pdf/2112.05141v1.pdf
- Plenoxels: Radiance Fields without Neural Networks: https://arxiv.org/pdf/2112.05131v1.pdf
- Player of Games: https://arxiv.org/pdf/2112.03178v1.pdf
- Improved Multiscale Vision Transformers for Classification and Detection: https://arxiv.org/pdf/2112.01526v1.pdf
- Training Robust Zero-Shot Voice Conversion Models With Self-Supervised Features: https://arxiv.org/pdf/2112.04424v1.pdf
- Grounded Language-Image Pre-training: https://arxiv.org/pdf/2112.03857v1.pdf
- GenIE: Generative Information Extraction: https://arxiv.org/pdf/2112.08340v1.pdf
- Systematic Generalization with Edge Transformers: https://arxiv.org/pdf/2112.00578v1.pdf
- PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers: https://arxiv.org/pdf/2111.12710v1.pdf
- BACON: Band-limited Coordinate Networks for Multiscale Scene Representation: https://arxiv.org/pdf/2112.04645v1.pdf
- Improving language models by retrieving from trillions of tokens: https://arxiv.org/pdf/2112.04426v1.pdf
- CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields: https://arxiv.org/pdf/2112.05139v1.pdf
- Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks: https://arxiv.org/pdf/2112.03324v1.pdf
- Multi-Scale Feature Learning Dynamics: Insights For Double Descent: https://arxiv.org/pdf/2112.03215v1.pdf
- Multipath++: Efficient Information Fusion And Trajectory Aggregation For Behavior Prediction: https://arxiv.org/pdf/2111.14973v2.pdf
- RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs: https://arxiv.org/pdf/2112.00724v1.pdf
- 3D Question Answering: https://arxiv.org/pdf/2112.08359v1.pdf
- Causal-based Time Series Domain Generalization for Vehicle Intention Prediction: https://arxiv.org/pdf/2112.02093v1.pdf
- DANETs: Deep Abstract Networks for Tabular Data Classification and Regression: https://arxiv.org/pdf/2112.02962v1.pdf
- Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields: https://arxiv.org/pdf/2112.03907v1.pdf
- FLAVA: A Foundational Language And Vision Alignment Model: https://arxiv.org/pdf/2112.04482v1.pdf
- CaSP: Class-agnostic Semi-Supervised Pretraining for Detection & Segmentation: https://arxiv.org/pdf/2112.04966v1.pdf
- FaceFormer: Speech-Driven 3D Facial Animation with Transformers: https://arxiv.org/pdf/2112.05329v1.pdf
- Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation: https://arxiv.org/pdf/2112.07431v1.pdf
- Coupling Vision and Proprioception for Navigation of Legged Robots: https://arxiv.org/pdf/2112.02094v1.pdf
- InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation: https://arxiv.org/pdf/2112.01589v2.pdf
- Robustness in Deep Learning for Computer Vision: Mind the gap?: https://arxiv.org/pdf/2112.00639v1.pdf
- Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videos: https://arxiv.org/pdf/2112.00585v1.pdf