Generative AI technologies have garnered much attention over the last 24 to 30 months. Innovations like GPT-3 and 3.5 (ChatGPT), DALL-E2, and Stable Diffusion have occupied a large part of the public discourse. The developers of these technologies, and the entire ecosystems around them are naturally creating a lot of hype. Others are adding to this because of the novelty factor of these technologies, the ‘gimmicky’ use cases associated with them, and to a certain extent, due to the FOMO factor (i.e., the fear of missing out on something perceived as important.)
Amidst all this noise, the critical question is: What are the real possibilities, and what are the limitations of Generative AI today? This paper attempts to explain that.
Recent Advances in Generative AI
Generative machine learning is not new. Some methods like auto-regressive models, energy-based & flow-based models, hidden Markov models (HMMs), and mixture models have been around for many years. Others, like generative adversarial networks (GANs) and variational encoders (VAEs), are relatively new but have also been around for several years.
In recent years, Deep Generative Modeling has received significant attention, particularly with the advent of Large Language Models (LLMs). Google’s seminal paper on Attention from 2017, and its subsequent release of BERT (Bidirectional Encoder Representations) in 2018 paved the way for Transformers to revolutionize machine learning. This new paradigm witnessed swift adoption in multiple areas, including generative modeling. Open AI’s GPT (Generative Pre-trained Transformer) series, particularly GPT-2 (released in early 2019), and GPT-3 (released in mid-2020), were among the earliest breakthroughs in generative modeling.
Other players followed suit with their own language models. DeepMind’s Gopher, Google’s GLaM (Generalist Language Model), PaLM (Pathways Language Model) & LaMDA (Language Models for Dialog Applications), Meta’s OPT (Open Pretrained Transformer), and Microsoft-NVIDIA’s Megatron-Turing NLG are some of the top LLMs. Very recently, GPT-3.5 (in the form of ChatGPT) has been creating a lot of waves.
In the Computer Vision domain, generative AI has witnessed significant innovations in the last two years. In 2021, OpenAI released their zero-shot learning-based CLIP (Contrastive Language-Image Pre-training) model for image-to-text captioning, and DALL-E for text-to-image generation. The company subsequently released its diffusion-based DALL-E2 model in mid-2022. Around the same time, the Stability AI team released their latent diffusion-based Stable Diffusion model; Google released Parti (Pathways Autoregressive Text-to-Image) and the diffusion-based Imagen; and Midjourney released its namesake model. More text-to-image models have been announced in recent months, such as Google’s masked generative transformer-based Muse, Salesforce’s EDICT (Exact Diffusion Inversion via Coupled Transformations), and GLIGEN.
Generative AI has accelerated innovations in other vision problems as well. Examples include 3D object/shape generation (e.g., Google’s DreamFusion, and NVIDIA’s Get3D), algorithm generation (e.g., DeepMind’s AlphaZero), image-to-text (e.g., DeepMind’s Flamingo), speech/audio-to-text (e.g., OpenAI’s Whisper), text-to-audio/speech (e.g., Google’s AudioLM, Microsoft’s VALL-E, and OpenAI’s Jukebox), and text-to-video (e.g., Google’s Phenaki, and Meta’s Make-A-Video.)
The Excessive Hype & Noise
AI companies, their developer/researcher communities, their financial backers, and their larger ecosystems (e.g., Data Science firms that may benefit from noise related to AI) are the primary drivers of this excessive hype. A few have even gone to the extent of claiming AI Sentiency. The reality is much different.
Novel technologies do not necessarily translate into successful applications.
Meta’s Galactica model for scientific text processing is a seminal example of how companies tend to position poorly architected, badly developed, or inadequately tested AI technologies as significant breakthroughs. Launched in November 2022, Galactica‘s online demo was withdrawn within a week because it generated biased and inaccurate outputs, including made-up facts.
Interestingly, the research paper associated with Galactica made a lot of claims, such as:
- outperforming GPT-3 on technical knowledge probes (e.g., LaTeX equations)
- outperforming Chinchilla on mathematical MMLU and PaLM 540B on MATH
- outperforming BLOOM & OPT-175B on BIG-bench despite not being trained on a general corpus
- State-of-the-art (SOTA) results on downstream tasks such as PubMedQA and MedMCQA dev.
Such examples teach us the importance of rigorously assessing new technologies, and understanding their limitations, before driving their adoption. Every claim of SOTA may not be valid. Moreover, successful AI technologies tend to create variants, and even copy-cats. For instance, BERT was followed by tens of variants, but only a few are practically useful. The rest are just noise.
Experts can be wrong too.
No technologist is always right, especially in the case of early-stage/evolving technologies. The same is true for AI pioneers and distinguished professionals—despite their high competence levels and best intentions. Every expert perspective need not be accepted as gospel truth. Moreover, a pattern is emerging where AI influencers are hyping up minor innovations and overstating their potential impact on businesses and lives. These professionals may have much to gain, directly or indirectly, from accentuating the noise.
Organizations sometimes fail to understand that hiring a bunch of AI experts does not guarantee the development of solid, usable AI technologies. It is not uncommon to see corporate AI researchers prioritizing the authorship of technical papers over conducting actual R&D. Designing a novel Deep Learning network with months of computationally expensive training and then publishing a top-quality research paper are hardly useful if the company cannot use that work, directly or indirectly, to serve its customers.
Large Language Models are transformative but have severe limitations.
Large Language Models (LLMs) are almost ubiquitous today in the NLP domain, solving a wide array of problems in content generation, dialogue management, machine translation, text summarization, question-answering, and other areas. LLMs are a major driving force behind many generative models, and are hugely transformative. At the same time, they suffer from critical limitations.
Bias & Toxicity Encoding: LLMs are known to exhibit different types of biases, such as gender bias, negative sentiments towards specific groups, or stereotypical associations. Such biases are not always easy to identify in language/text processing. So, it is often unclear whether bias corrections are conducted before models are released, and to what extent. Toxicity is another concern, and the problem is compounded by the fact that most toxicity-removal techniques today are primarily sub-optimal. For instance, this research highlights how GPT-3 and others may create toxic outputs even with non-toxic prompts.
Explainability Issues: LLMs are usually black boxes with poor inherent explainability. This makes it difficult to determine whether the models generate outputs based on non-essential features, and spurious patterns. While Attention-based visualization, feature attribution, or saliency techniques (e.g., occlusion-based or propagation-based) may be used, their ability to explain the LLM decisions is generally limited.
Superficial Learning & Stochastic Parrots: LLMs tend to focus more on learning non-robust features (e.g., position or style), and less on learning robust features (e.g., reasoning and semantic understanding). This is known as superficial or shortcut learning. Moreover, LLMs are often considered to be ‘Stochastic Parrots’ because of their ability to reproduce the training data based on probabilistic sequences, and with negligible-to-limited attention to coherence and meaning. Additionally, Prompt Engineering, an important operational aspect of many LLMs, has its own issues. For instance, it is widely observed that minor changes in prompts may lead to major changes in the model outputs.
AI companies tend to hard-sell the fact that LLMs are trained on extremely large datasets, and are built with billions of parameters. However, these are not enough to guarantee the quality and diversity of the training data, or to satisfactorily explain the model decisions. Not every company prioritizes addressing these limitations, mainly due to the additional costs, increased time-to-market, and higher skills associated with such exercises.
Hazardous Results, Plagiarism, and Privacy Risks
Generative models, especially when backed by LLMs, may produce hazardous outputs, such as discriminatory results against certain groups, falsehoods (e.g., deep-fakes), distorted information, and inappropriate or toxic content. For instance, Stack Overflow has banned ChatGPT-generated responses, at least temporarily, due to a high rate of misleading and incorrect answers that seem otherwise plausible.
A recent paper highlights how diffusion models might replicate the training data directly or indirectly (e.g., as a collage of multiple images). This is likely due to the complex interactions of various factors, such as highly skewed image distributions and overfitting of certain subsets of the data. Such content replication may not always fall under acceptable and fair use policies. Image, music, text, and video generation are all susceptible to this risk of plagiarism.
The recent examples of Getty Images suing Stable Diffusion, and a group of artists suing Stable Diffusion & Midjourney for alleged copyright violation may just be the beginning of a new pattern. As AI & digital technologies evolve further, and global competition intensifies, such scenarios are likely to become more common. On the other hand, Shutterstock’s partnership with OpenAI, and with Meta set great examples for other players to follow.
Similarly, the ‘Stochastic Parrot’ nature of LLMs may lead to privacy violations, and intellectual property infringement if the models are trained on copyrighted or private data, and output the same during inferencing. In fact, GitHub, Microsoft and OpenAI are already facing a lawsuit for alleged copyright violations during their training of the Codex model.
Extracting the Signal: The Real Impact of Generative AI
Content generation (e.g., artwork & creative designs, blog & news writing, music & videos, personalized reports, and text summaries), and interaction-based applications (e.g., chat/dialogue, and question-answering) are two major areas where generative AI is already making a huge difference. For instance, the AI writing assistant Jasper leverages GPT-3 capabilities, coupled with human guidance, to produce blogs and website content for its users. The recently released ChatGPT has created a strong impression within a short time. DeepMind’s Sparrow might be available later this year, and some expect it to be better than ChatGPT.
Code generation is an area where Generative AI is already starting to make a mark. Having said that, there is still a long way to go. For instance, GitHub CoPilot, built on top of OpenAI’s Codex, is a commercial product that automates code development, at least partly. However, issues related to code quality and correctness have been reported on several occasions. This makes it difficult for organizations to extensively adopt the product for building enterprise-grade code. Similarly, a Stanford University paper showcased that Codex-based AI assistants wrote significantly less secure code than regular developers. More innovations are needed to address these gaps, which may take time.
Drug discovery/design is a critical area where generative AI can play a significant role. The NVIDIA-backed ProT-VAE (Protein Transformer Variational AutoEncoder) model is an excellent example. Developed on NVIDIA’s BioNeMo framework, which leverages Megatron under the hood, this model generates new protein designs for specific biological functions. Read this report for more details.
Knowledge Management and Search will also get disrupted in the near term. Generative AI models will be leveraged to scan and synthesize vast volumes of data that reside within companies (or on the public internet); and generate high-quality artifacts (e.g., summarized documents) and natural (human-like) responses to user questions and searches. For instance, Microsoft’s recent investment in OpenAI opens up opportunities for GPT & other models to be integrated with Bing Search.
Closing Comments
Generative AI is highly disruptive. However, except for a handful of domains, most of the disruption will likely happen in the medium-to-long term, and not so much in the short term. Many innovations are still needed to optimize and mature the underlying technologies, and address critical limitations. Cautious optimism should drive enterprise adoption.
More clarity is also needed on issues like compliance, ethics, privacy, and intellectual property rights that are specific to generative AI. At this point, the fog of excessive hype makes it difficult for corporations and regulators to formulate effective policies. Artists, content creators, designers, and similar professionals will probably be the most impacted ones in the near term. At the same time, new professions will emerge due to tasks like curation (or review) of AI-generated content, and prompt engineering.
2023 is going to be interesting, and hopefully, we will witness robust versions of GPT-4, Sparrow, and other generative AI innovations this year.
Acknowledgment
- Toward General Design Principles for Generative AI Applications: Justin Weisz, Michael Muller, Jessica He, Stephanie Houde
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?: Emily Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell