Recommended AI Papers: August 2022 – Tamal Dutta Chowdhury

3D Vision with Transformers: A Survey: https://arxiv.org/pdf/2208.04309v1.pdf
Unifying Visual Perception by Dispersible Points Learning: https://arxiv.org/pdf/2208.08630v1.pdf
ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild: https://arxiv.org/pdf/2208.11547v1.pdf
ROLAND: Graph Learning Framework for Dynamic Graphs: https://arxiv.org/pdf/2208.07239v1.pdf
Investigating Efficiently Extending Transformers for Long Input Summarization: https://arxiv.org/pdf/2208.04347v1.pdf
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion: https://arxiv.org/pdf/2207.14172v1.pdf
TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism for a Deep Segmentation Model: https://arxiv.org/pdf/2207.13415v1.pdf
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers: https://arxiv.org/pdf/2207.13820v1.pdf
DETRs with Hybrid Matching: https://arxiv.org/pdf/2207.13080v1.pdf
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer: https://arxiv.org/pdf/2207.14024v3.pdf
A Simple Baseline for Multi-Camera 3D Object Detection: https://arxiv.org/pdf/2208.10035v1.pdf
Learning Visibility for Robust Dense Human Body Estimation: https://arxiv.org/pdf/2208.10652v1.pdf
SwinIR: Image Restoration Using Swin Transformer: https://arxiv.org/pdf/2108.10257v1.pdf
MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures: https://arxiv.org/pdf/2208.00277v2.pdf
Z-BERT-A: a zero-shot pipeline for unknown intent detection: https://arxiv.org/pdf/2208.07084v2.pdf
HighlightNet: Highlighting Low-Light Potential Features for Real-Time UAV Tracking: https://arxiv.org/pdf/2208.06818v1.pdf
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise: https://arxiv.org/pdf/2208.09392v1.pdf
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion: https://arxiv.org/pdf/2208.01618v1.pdf
PeRFception: Perception using Radiance Fields: https://arxiv.org/pdf/2208.11537v1.pdf
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning: https://arxiv.org/pdf/2208.07860v1.pdf
YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception: https://arxiv.org/pdf/2208.11434v1.pdf
Fast Infinite Waveform Music Generation: https://arxiv.org/pdf/2208.08706v1.pdf
YOLOV: Making Still Image Object Detectors Great at Video Object Detection: https://arxiv.org/pdf/2208.09686v1.pdf
Pix4Point: Image Pretrained Transformers for 3D Point Cloud Understanding: https://arxiv.org/pdf/2208.12259v1.pdf
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments: https://arxiv.org/pdf/2208.11311v1.pdf
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale: https://arxiv.org/pdf/2208.07339v1.pdf
TotalSegmentator: robust segmentation of 104 anatomical structures in CT images: https://arxiv.org/pdf/2208.05868v1.pdf
A Library For Representing Python Programs As Graphs For Machine Learning: https://arxiv.org/pdf/2208.07461v1.pdf
Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models: https://arxiv.org/pdf/2208.09399v1.pdf
Refine and Represent: Region-to-Object Representation Learning: https://arxiv.org/pdf/2208.11821v1.pdf
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning: https://arxiv.org/pdf/2208.04202v1.pdf
Representation Learning For The Automatic Indexing Of Sound Effects Libraries: https://arxiv.org/pdf/2208.09096v1.pdf
Contrastive Audio-Language Learning For Music: https://arxiv.org/pdf/2208.12208v1.pdf
Unbiased Multi-Modality Guidance for Image Inpainting: https://arxiv.org/pdf/2208.11844v1.pdf
Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing: https://arxiv.org/pdf/2208.08092v1.pdf
PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation: https://arxiv.org/pdf/2208.09350v1.pdf
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data: https://arxiv.org/pdf/2107.10833v2.pdf
DeepInteraction: 3D Object Detection via Modality Interaction: https://arxiv.org/pdf/2208.11112v2.pdf
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors: https://arxiv.org/pdf/2208.11356v1.pdf
Federated Contrastive Learning and Masked Autoencoder for Dermatological Disease Diagnosis: https://arxiv.org/pdf/2208.11278v1.pdf
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling: https://arxiv.org/pdf/2208.12257v1.pdf
Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer: https://arxiv.org/pdf/2208.11523v1.pdf
I Know What You Do Not Know: Knowledge Graph Embedding via Co-distillation Learning: https://arxiv.org/pdf/2208.09828v1.pdf
Learning Spatial-Frequency Transformer for Visual Object Tracking: https://arxiv.org/pdf/2208.08829v1.pdf
Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer: https://arxiv.org/pdf/2208.09300v1.pdf
Conviformers: Convolutionally Guided Vision Transformer: https://arxiv.org/pdf/2208.08900v1.pdf
Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation: https://arxiv.org/pdf/2208.08315v3.pdf
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries: https://arxiv.org/pdf/2208.07638v1.pdf
Investigating Efficiently Extending Transformers for Long Input Summarization: https://arxiv.org/pdf/2208.04347v1.pdf
Domain-Specific Text Generation for Machine Translation: https://arxiv.org/pdf/2208.05909v1.pdf
SSformer: A Lightweight Transformer for Semantic Segmentation: https://arxiv.org/pdf/2208.02034v1.pdf
Transformers as Meta-Learners for Implicit Neural Representations: https://arxiv.org/pdf/2208.02801v2.pdf
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models: https://arxiv.org/pdf/2208.03306v1.pdf
Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis: https://arxiv.org/pdf/2208.01843v1.pdf
Reference-based Image Super-Resolution with Deformable Attention Transformer: https://arxiv.org/pdf/2207.11938v2.pdf
Focused Decoding Enables 3D Anatomical Detection by Transformers: https://arxiv.org/pdf/2207.10774v2.pdf
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation: https://arxiv.org/pdf/2208.01159v4.pdf
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios: https://arxiv.org/pdf/2207.05501v4.pdf
VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations: https://arxiv.org/pdf/2208.09021v1.pdf
Image as a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Tasks: https://arxiv.org/pdf/2208.10442v1.pdf
DPTDR: Deep Prompt Tuning for Dense Passage Retrieval: https://arxiv.org/pdf/2208.11503v1.pdf
Language Supervised Training for Skeleton-based Action Recognition: https://arxiv.org/pdf/2208.05318v1.pdf
Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss: https://arxiv.org/pdf/2208.02959v1.pdf
Helixfold-Single: Msa-Free Protein Structure Prediction By Using Protein Language Model As An Alternative: https://arxiv.org/pdf/2207.13921v2.pdf
PyABSA: Open Framework for Aspect-based Sentiment Analysis: https://arxiv.org/pdf/2208.01368v1.pdf
Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification: https://arxiv.org/pdf/2208.06616v1.pdf
COCOA: Cross Modality Contrastive Learning for Sensor Data: https://arxiv.org/pdf/2208.00467v2.pdf
Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations: https://arxiv.org/pdf/2208.05413v1.pdf
Mind the Gap in Distilling StyleGANs: https://arxiv.org/pdf/2208.08840v1.pdf
Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer: https://arxiv.org/pdf/2208.05216v1.pdf
Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer: https://arxiv.org/pdf/2207.14024v3.pdf
FRA-RIR: Fast Random Approximation of the Image-source Method: https://arxiv.org/pdf/2208.04101v1.pdf
Learnability Enhancement for Low-light Raw Denoising: Where Paired Real Data Meets Noise Modeling: https://arxiv.org/pdf/2207.06103v2.pdf
Language Supervised Training for Skeleton-based Action Recognition: https://arxiv.org/pdf/2208.05318v1.pdf
Expanding Language-Image Pretrained Models for General Video Recognition: https://arxiv.org/pdf/2208.02816v1.pdf
Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects: https://arxiv.org/pdf/2208.03792v1.pdf