- LaMDA: Language Models for Dialog Applications: https://arxiv.org/pdf/2201.08239v3.pdf
- Data-Driven Offline Optimization For Architecting Hardware Accelerators: https://arxiv.org/pdf/2110.11346v3.pdf
- Don’t Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis: https://arxiv.org/pdf/2202.07728v1.pdf
- Block-NeRF: Scalable Large Scene Neural View Synthesis: https://arxiv.org/pdf/2202.05263v1.pdf
- Maintaining fairness across distribution shift: do we have viable solutions for real-world applications?: https://arxiv.org/pdf/2202.01034v1.pdf
- Transformers Can Do Bayesian Inference: https://arxiv.org/pdf/2112.10510v3.pdf
- How to build a cognitive map: insights from models of the hippocampal formation: https://arxiv.org/pdf/2202.01682v1.pdf
- Transformers in Time Series: A Survey: https://arxiv.org/pdf/2202.07125.pdf
- A Survey on Model Compression for Natural Language Processing: https://arxiv.org/pdf/2202.07105.pdf
- Threats to Pre-trained Language Models: Survey and Taxonomy: https://arxiv.org/pdf/2202.06862.pdf
- Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series: https://arxiv.org/pdf/2202.02403.pdf
- Unified Scaling Laws For Routed Language Models: https://arxiv.org/pdf/2202.01169v2.pdf
- How Do Vision Transformers Work?: https://arxiv.org/pdf/2202.06709v1.pdf
- Progressive Distillation For Fast Sampling Of Diffusion Models: https://arxiv.org/pdf/2202.00512v1.pdf
- Review Of Automated Time Series Forecasting Pipelines: https://arxiv.org/pdf/2202.01712v1.pdf
- The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective: https://arxiv.org/pdf/2202.01602v3.pdf
- N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting: https://arxiv.org/pdf/2201.12886v2.pdf
- Deconstructing The Inductive Biases Of Hamiltonian Neural Networks: https://arxiv.org/pdf/2202.04836v2.pdf
- Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel: https://arxiv.org/pdf/2202.05254v1.pdf
- VOS: Learning What You Don’t Know By Virtual Outlier Synthesis: https://arxiv.org/pdf/2202.01197v3.pdf
- F8NET: Fixed-Point 8-Bit Only Multiplication For Network Quantization: https://arxiv.org/pdf/2202.05239v1.pdf
- Decoupling Local and Global Representations of Time Series: https://arxiv.org/pdf/2202.02262v2.pdf
- Optimal learning rate schedules in high-dimensional non-convex optimization problems: https://arxiv.org/pdf/2202.04509v1.pdf
- Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning: https://arxiv.org/pdf/2201.13425v2.pdf
- ETSformer: Exponential Smoothing Transformers for Time-series Forecasting: https://arxiv.org/pdf/2202.01381v1.pdf
- Diversify and Disambiguate: Learning From Underspecified Data: https://arxiv.org/pdf/2202.03418v1.pdf
- How to Understand Masked Autoencoders: https://arxiv.org/pdf/2202.03670v2.pdf
- 3D Object Detection from Images for Autonomous Driving: A Survey: https://arxiv.org/pdf/2202.02980v2.pdf
- Tiny Object Tracking: A Large-scale Dataset and A Baseline: https://arxiv.org/pdf/2202.05659v1.pdf
- COST: Contrastive Learning Of Disentangled Seasonal-Trend Representations For Time Series Forecasting: https://arxiv.org/pdf/2202.01575v1.pdf
- UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation: https://arxiv.org/pdf/2109.05812.pdf
- Generative Flow Networks for Discrete Probabilistic Modeling: https://arxiv.org/pdf/2202.01361v1.pdf