Why Agentic Infrastructure Is the New Moat in AI: Systems Compound, Models Don’t

Introduction The competitive advantage in AI is shifting from model performance to systems infrastructure. As model capabilities converge and access to powerful foundation models becomes commoditized, the true moat is the agentic infrastructure built around them. This stack encompasses orchestration frameworks, memory systems, secure tooling, observability mechanisms, and evaluation loops, and transforms static models into…

Beyond VC Dimension & Rademacher Complexity: Revisiting Generalization with Kolmogorov Lenses

The classical theory of generalization in statistical learning has largely revolved around complexity measures such as the Vapnik-Chervonenkis (VC) Dimension and Rademacher Complexity. While these tools have been instrumental in providing theoretical guarantees, they often rely on worst-case analyses and may not fully capture the intrinsic structure of real-world learning problems. In this paper, we…

Engineering Multi-Agent Systems: A Technical Playbook

Introduction: Over the past few years, Agentic AI has been generating significant excitement, and deservedly so. In tandem with Generative AI, it represents the new frontier in Artificial Intelligence. While agent-based systems have existed for decades, it is only now that their capabilities are capturing mainstream attention. Organizations, ranging from startups to tech giants, are…

Decoding the Hype of AI Chips

Introduction Recent advancements in artificial intelligence have fueled considerable excitement around what many call “AI Chips”, specialized hardware tailored to optimize and accelerate AI workloads. Amidst the noise, many companies appear to be branding nearly every processor enhancement or hardware upgrade as an AI chip, blurring the lines between genuine AI advancements and general performance…

Reinforcement Learning: A Catalyst for Next-Gen Mathematical Optimization

Abstract Mathematical optimization drives complex decision-making across a diverse range of problems – e.g., energy management, inventory planning, network design, pricing & revenue management, production planning & scheduling, supplier selection, and transportation planning. This field has significantly evolved over decades, from its formative years around World War II to the modern age. While conventional optimization…

AI Research & Innovation in 2024, Vol. 2

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts This paper combines State Space Modeling (SSM) with the Mixture of Experts (MoE) approach, and introduces the MoE-Mamba model in which every other Mamba layer is replaced with a MoE feed-forward layer based on the Switch transformer. MoE-Mamba is shown to not only outperform both…

AI Research & Innovation in 2024, Vol. 1

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models This paper highlights 32 techniques to mitigate hallucination in LLMs, including a well-defined taxonomy to categorize these methods.   AlphaGeometry: An Olympiad-level AI system for geometry DeepMind introduced its AI system for solving Olympiad geometry problems. Trained exclusively on synthetic data, AlphaGeometry uses a…

The Surprising Success of TiDE in Long-Term Time-Series Forecasting

Deep Learning-based architectures have had a significant impact on computer vision, natural language processing, and other machine learning areas. However, the scenario is not so straightforward when it comes to Forecasting, an area where statistical and traditional machine learning models have generally outperformed other types of models. In recent years, Transformer architectures (e.g., Google’s Temporal…

Google’s Spotlight, Meta’s LLaMA, and other innovations

Google introduced Spotlight, a foundational model for mobile UI modeling, particularly for tasks like command grounding, screen summarization, tappability prediction, and widget captioning. Traditional mobile UI design often uses the concept of view hierarchy information, but these view hierarchies are sometimes either not available, or corrupted. Spotlight not only bypasses the need for view hierarchies,…

VALL-E, ChatGPT for Medical Advice, and other innovations

Microsoft introduced VALL-E, its neural codec language model for zero-shot Text-to-Speech Synthesis (TTS) that generates high-quality audio/speech with only a 3-second acoustic prompt (i.e., voice recording.) Unlike conventional models that consider TTS a continuous signal regression task, VALL-E approaches this as a conditional language modeling problem. Trained on LibriLight’s 60K+ audio hours, VALL-E was shown…

LangChain: A step towards building better LLM-based conversational applications

Large Language Models (LLMs) are state-of-the-art today, and generally perform well for simple and low-interaction tasks, such as single-turn conversations, and command-and-response systems. However, their direct use is generally limited in the case of applications with complex and high-interaction tasks, such as multi-turn dialogue systems, and enterprise-grade chatbots. Most real-world applications are complex, and are…

NeurIPS 2022: My Top Two ‘Practically-Relevant’ Papers from the Outstanding 13

[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget] NeurIPS 2022 declared 13 submissions as outstanding papers from its main track. Is Out-of-distribution Detection Learnable? Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Elucidating the Design Space of Diffusion-Based Generative Models ProcTHOR: Large-Scale Embodied AI Using Procedural Generation Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines A…

Recommended AI Papers: August 2022

3D Vision with Transformers: A Survey: https://arxiv.org/pdf/2208.04309v1.pdf Unifying Visual Perception by Dispersible Points Learning: https://arxiv.org/pdf/2208.08630v1.pdf ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild: https://arxiv.org/pdf/2208.11547v1.pdf ROLAND: Graph Learning Framework for Dynamic Graphs: https://arxiv.org/pdf/2208.07239v1.pdf Investigating Efficiently Extending Transformers for Long Input Summarization: https://arxiv.org/pdf/2208.04347v1.pdf Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion: https://arxiv.org/pdf/2207.14172v1.pdf TransNorm:…

Recommended AI Papers: July 2022

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs: https://arxiv.org/pdf/2207.00257.pdf Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding: https://arxiv.org/pdf/2207.02971v1.pdf More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity: https://arxiv.org/pdf/2207.03620v1.pdf Softmax-free Linear Transformers: https://arxiv.org/pdf/2207.03341v1.pdf Learning Quality-aware Dynamic Memory for Video Object Segmentation: https://arxiv.org/pdf/2207.07922v1.pdf 3D…

Recommended AI Papers: June 2022

Scaling Vision Transformers: https://arxiv.org/pdf/2106.04560.pdf Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation: https://arxiv.org/pdf/2107.01378.pdf Risk-averse autonomous systems: A brief history and recent developments from the perspective of optimal control: https://arxiv.org/pdf/2109.08947.pdf LightSeq2: Accelerated Training for Transformer-based Models on GPUs: https://arxiv.org/pdf/2110.05722.pdf Conditionally Elicitable Dynamic Risk Measures For Deep Reinforcement Learning: https://arxiv.org/pdf/2206.14666.pdf Cooperative Retriever and Ranker in Deep Recommenders:…

Recommended AI Papers: May 2022

Computational Storytelling And Emotions: A Survey: https://arxiv.org/pdf/2205.10967.pdf Are Large Pre-Trained Language Models Leaking Your Personal Information?: https://arxiv.org/pdf/2205.12628.pdf FreDo: Frequency Domain-based Long-Term Time Series Forecasting: https://arxiv.org/pdf/2205.12301.pdf A Survey on Long-tailed Visual Recognition: https://arxiv.org/pdf/2205.13775.pdf Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors: https://arxiv.org/pdf/2205.12854.pdf On the Robustness of Safe Reinforcement Learning under Observational Perturbations: https://arxiv.org/pdf/2205.14691.pdf Nesterov’s…

Recommended AI Papers: April 2022

Multiview Transformers for Video Recognition: https://arxiv.org/pdf/2201.04288.pdf ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer: https://arxiv.org/pdf/2202.07305.pdf Privacy-preserving Anomaly Detection in Cloud Manufacturing via Federated Transformer: https://arxiv.org/pdf/2204.00843.pdf A Tour of Visualization Techniques for Computer Vision Datasets: https://arxiv.org/pdf/2204.08601.pdf Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings: https://arxiv.org/pdf/2204.04063.pdf Data Distributional Properties Drive Emergent Few-Shot Learning in…

Recommended AI Papers: March 2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism: https://arxiv.org/pdf/2203.05804.pdf Augmented Reality and Robotics: A Survey and Taxonomy for AR-enhanced Human-Robot Interaction and Robotic Interfaces: https://arxiv.org/pdf/2203.03254.pdf A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization: https://arxiv.org/pdf/2203.16615.pdf Monte Carlo Tree Search based Hybrid Optimization of Variational Quantum Circuits: https://arxiv.org/pdf/2203.16707.pdf…

Recommended AI papers: Feb 16 – 28, 2022

Is Neuro-Symbolic AI Meeting its Promise in Natural Language Processing? A Structured Review: https://arxiv.org/pdf/2202.12205.pdf NeuralFusion: Neural Volumetric Rendering under Human-object Interactions: https://arxiv.org/pdf/2202.12825.pdf Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection: https://arxiv.org/pdf/2202.07586.pdf Deep Recurrent Modelling of Granger Causality with Latent Confounding: https://arxiv.org/pdf/2202.11286.pdf Generalizable Information Theoretic Causal Representation: https://arxiv.org/pdf/2202.08388.pdf Artificial Intelligence for the…

Recommended AI papers: Feb 1 – 15, 2022

LaMDA: Language Models for Dialog Applications: https://arxiv.org/pdf/2201.08239v3.pdf Data-Driven Offline Optimization For Architecting Hardware Accelerators: https://arxiv.org/pdf/2110.11346v3.pdf Don’t Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis: https://arxiv.org/pdf/2202.07728v1.pdf Block-NeRF: Scalable Large Scene Neural View Synthesis: https://arxiv.org/pdf/2202.05263v1.pdf Maintaining fairness across distribution shift: do we have viable solutions for real-world applications?: https://arxiv.org/pdf/2202.01034v1.pdf Transformers Can Do Bayesian Inference:…

Recommended AI papers: Jan 16 – 31, 2022

A Systematic Exploration Of Reservoir Computing For Forecasting Complex Spatiotemporal Dynamics: https://arxiv.org/pdf/2201.08910.pdf FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting: https://arxiv.org/pdf/2201.12740.pdf Quantifying Epistemic Uncertainty in Deep Learning: https://arxiv.org/pdf/2110.12122.pdf What’s Wrong With Deep Learning In Tree Search For Combinatorial Optimization: https://arxiv.org/pdf/2201.10494.pdf A Leap among Quantum Computing and Quantum Neural Networks: A Survey: https://arxiv.org/pdf/2107.03313.pdf Causality And…

Recommended AI papers: Jan 1 – 15, 2022

Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning: https://arxiv.org/pdf/2201.05151.pdf A unified software/hardware scalable architecture for brain-inspired computing based on self-organizing neural models: https://arxiv.org/pdf/2201.02262v1.pdf MGAE: Masked Autoencoders for Self-Supervised Learning on Graphs: https://arxiv.org/pdf/2201.02534v1.pdf Applications of Signature Methods to Market Anomaly Detection: https://arxiv.org/pdf/2201.02441v1.pdf Does entity abstraction help generative Transformers reason?: https://arxiv.org/pdf/2201.01787v1.pdf Debiased Learning…

Recommended AI papers: Dec 16 – 31, 2021

A Globally Convergent Distributed Jacobi Scheme for Block-Structured Non-convex Constrained Optimization Problems: https://arxiv.org/pdf/2112.09027.pdf A Robust Optimization Approach to Deep Learning: https://arxiv.org/pdf/2112.09279.pdf A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation: https://arxiv.org/pdf/2112.09747.pdf A Survey of Natural Language Generation: https://arxiv.org/pdf/2112.11739.pdf Are Large-scale Datasets Necessary for Self-Supervised Pre-training?: https://arxiv.org/pdf/2112.10740.pdf A Survey on Gender Bias in Natural…

Recommended AI papers: Dec 1 – 15, 2021

Simulation Intelligence: Towards A New Generation Of Scientific Methods: https://arxiv.org/pdf/2112.03235v1.pdf Information is Power: Intrinsic Control via Information Capture: https://arxiv.org/pdf/2112.03899v1.pdf GLaM: Efficient Scaling of Language Models with Mixture-of-Experts: https://arxiv.org/pdf/2112.06905v1.pdf Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning: https://arxiv.org/pdf/2112.03763v1.pdf Efficient Geometry-aware 3D Generative Adversarial Networks: https://arxiv.org/pdf/2112.07945v1.pdf GAN-Supervised Dense Visual Alignment: https://arxiv.org/pdf/2112.05143v1.pdf BEVT: BERT Pretraining of Video…

Recommended AI papers: Nov 16 – 30, 2021

Covariate Shift in High-Dimensional Random Feature Regression: https://arxiv.org/pdf/2111.08234v1.pdf Improving Transferability of Representations via Augmentation-Aware Self-Supervision: https://arxiv.org/pdf/2111.09613v1.pdf GFlowNet Foundations: https://arxiv.org/pdf/2111.09266v1.pdf Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability: https://arxiv.org/pdf/2111.10752v1.pdf Benchmarking Detection Transfer Learning with Vision Transformers: https://arxiv.org/pdf/2111.11429v1.pdf Improved Knowledge Distillation via Adversarial Collaboration: https://arxiv.org/pdf/2111.14356v1.pdf End-to-End Referring Video Object Segmentation with Multimodal Transformers: https://arxiv.org/pdf/2111.14821v1.pdf…

Recommended AI papers: Nov 1 – 15, 2021

Gradients are Not All You Need: https://arxiv.org/pdf/2111.05803v1.pdf RAVE: A variational autoencoder for fast and high-quality neural audio synthesis: https://arxiv.org/pdf/2111.05011v1.pdf NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework: https://arxiv.org/pdf/2111.04130v1.pdf A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis: https://arxiv.org/pdf/2110.15678v2.pdf On Representation Knowledge Distillation for Graph Neural Networks: https://arxiv.org/pdf/2111.04964v1.pdf Meta-Learning to Improve Pre-Training:…

Recommended AI papers: Oct 16 – 31, 2021

Shaking the foundations: delusions in sequence models for interaction and control: https://arxiv.org/pdf/2110.10819v1.pdf Understanding Dimensional Collapse in Contrastive Self-supervised Learning: https://arxiv.org/pdf/2110.09348v1.pdf SOFT: Softmax-free Transformer with Linear Complexity: https://arxiv.org/pdf/2110.11945v2.pdf Understanding How Encoder-Decoder Architectures Attend: https://arxiv.org/pdf/2110.15253v1.pdf Parameter Prediction for Unseen Deep Architectures: https://arxiv.org/pdf/2110.13100v1.pdf From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence: https://arxiv.org/pdf/2110.15245v1.pdf Implicit MLE: Backpropagating Through…

Recommended AI papers: Oct 1 – 15, 2021

Multitask prompted training enables zero-shot task generalization: https://arxiv.org/pdf/2110.08207v1.pdf DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries: https://arxiv.org/pdf/2110.06922v1.pdf Object DGCNN: 3D Object Detection using Dynamic Graphs: https://arxiv.org/pdf/2110.06923v1.pdf Symbolic Knowledge Distillation: from General Language Models to Common sense Models: https://arxiv.org/pdf/2110.07178v1.pdf Graph Neural Networks with Learnable Structural and Positional Representations: https://arxiv.org/pdf/2110.07875.pdf Human-Robot Collaboration and Machine Learning:…

Architecting AI Applications

It is common knowledge that efficient architecture design is a key aspect of building any product/solution, including AI applications. However, in reality, it is often observed that companies pay limited attention to developing robust end-to-end architectures before initiating AI application development. This leads to several problems like schedule/cost overruns, automation job failures, interoperability & scalability…

Building Next-Gen Artificial Intelligence Systems Through Multimodal Machine Learning

Human perception is multimodal. We make sense of objects and events through multiple modalities (sensory organs), and that is why we excel in our understanding of the world around us. Similarly, in many real-world problems, Artificial Intelligence systems become more efficient when they process inputs (signals) from multiple modalities, and then generate the outputs (prediction.)…

The Curious Representation Learning (CRL) Framework

Researchers from MIT & IBM recently introduced CRL, a new self-supervised framework that learns task-agnostic visual representations in embodied environments. This approach is able to construct representations not only from unlabeled datasets, but also from environments. The CRL framework jointly learns a reinforcement learning policy, and visual representation model. The policy tries to maximize the…

A New Legal Framework for AI

The European Union has just released its first legal framework for Artificial Intelligence. It covers a wide range of areas, including:▪︎ Defining the fundamental notion of an AI system.▪︎ Laying down the requirements for high risk AI systems, and obligations of their operators.▪︎ Prohibiting certain AI practices, e.g., attempts to distort human behavior; real-time remote…

China’s Super-Scale Intelligence Model System

The Beijing Academy of Artificial Intelligence (BAAI) released China’s first super-scale intelligence model system: WuDao 1.0. This is a combination of four very large-scale NLP models. WenYuan: China’s largest pre-training language model (supporting Chinese & English) for text categorization, sentiment analysis, reading comprehension, etc. It claims GPT-3 comparable performance on several important NLU tasks. WenLan:…

The Current State of AutoML

Automated Machine Learning has come a long way since Google Brain introduced NAS in early 2017. Amazon’s AutoGluon, Google’s AutoMLZero, Salesforce’s TransmogrifAI, the AutoML features of Azure, H2O, Scikit-learn, Keras & others (TPOT, DataRobot, etc.) are witnessing increased adoption. Modern AutoML systems generally focus on hyper-parameter optimization (HPO), neural architecture search (NAS), model selection & compression, and to a certain extent, meta-learning. They also…

The GEM Benchmark for Natural Language Generation

Earlier this year, 55 researchers from 44 global institutions proposed GEM (Generation, Evaluation & Metrics), a new benchmark environment for Natural Language Generation. It evaluates models through an interactive result exploration system. This enables a much better understanding of model limitations & improvement opportunities, and does not misrepresent the complex interactions of individual measures. This…

Design Patterns for building AI & ML Applications

Design Patterns are reusable, formalized constructs that serve as templates to address common problems in designing efficient systems. These enable the development of high-performance, resilient & robust applications. Widely-used design patterns, especially from the object-oriented paradigm, include:▪︎ Behavioral Patterns: Command, Mediator, Memento, Observer, Visitor, etc.▪︎ Creational Patterns: Builder, Factory, Prototype, Singleton, etc.▪︎ Structural Patterns: Adapter, Bridge, Composite, Decorator,…

Trends & Innovations In Object Detection

Object detection is one of the most fundamental problems in Computer Vision, and has powered many of the significant advances in this field. It has applications in a wide range of areas such as advertising, driverless cars, healthcare, robotics, security, and others. This quasi-technical paper discusses the evolution, critical areas of research, and recent innovations…

The State of Play in Emotion AI

Emotion AI, also known as Affective Computing or Artificial Emotional Intelligence, is an interdisciplinary field that operates at the intersection of Behavioral Science, Cognitive Computing, Computer Science, Machine Learning, Neuroscience, Psychology, Signal Processing, and others. This is one of the rapidly evolving areas of AI research today. At the basic level, Emotion AI refers to…

Using AI To Counter Zero-Day Cyber Attacks: A Security Imperative During The COVID-19 Global Crisis

The COVID-19 pandemic is an earth-shattering, black swan event. Personal lives, societies and businesses are getting severely impacted. As governments, companies, institutions, and other organizations around the world shift their focus and resource allocation from their regular objectives to controlling this pandemic and its impact, it also increases the risk of serious cyber-attacks from rogue…

Knowledge Graphs in AI Development

A common grievance of most enterprises is that while data is abundant, there is not enough knowledge. Data is the symbolic representation of the observable properties of real-world entities and, on its own, yields limited practical value. Knowledge, on the other hand, is ‘meaningful data’ created through cognitive processing mechanisms. Generating actionable knowledge from raw…

Is Technical Debt Derailing your AI-driven Transformation Program?

An Asian publishing company embarked on a major transformation program to deploy AI-based analytics and business intelligence solutions across all their strategic business units. The program witnessed initial success during the proof-of-concept and early validation stages. This encouraged the company to initiate a phase-wise enterprise roll-out. However, the roll-out turned out to be a major…

Applied R&D And The Fourth Industrial Revolution

The Fourth Industrial Revolution (4IR) has begun. We are already witnessing massive disruptions in different forms and levels in most aspects of our businesses and lives. The determinants of business success are changing. Incremental and Non-R&D innovation will not be adequate for business leadership in this new industrial age. There is a compelling need to…