VALL-E, ChatGPT for Medical Advice, and other innovations

Microsoft introduced VALL-E, its neural codec language model for zero-shot Text-to-Speech Synthesis (TTS) that generates high-quality audio/speech with only a 3-second acoustic prompt (i.e., voice recording.) Unlike conventional models that consider TTS a continuous signal regression task, VALL-E approaches this as a conditional language modeling problem. Trained on LibriLight’s 60K+ audio hours, VALL-E was shown to outperform popular TTS systems (e.g., LibriSpeech and VCTK) on various parameters, such as speech naturalness, and speaker similarity. The researchers also claim that VALL-E possesses strong in-context learning capabilities and can efficiently replicate the emotions, frequencies, and acoustic environments of the voice prompts.

Google took an important step towards leveraging LLMs for solving practical challenges in healthcare. Based on their Text-to-Text Transfer Transformer (T5) and a modified version of the Needleman–Wunsch algorithm, they announced a new model to decipher clinical notes, including abbreviations and shorthand. This innovation will enable doctors’ prescriptions and notes to be easily read by patients and healthcare professionals and deliver augmented intelligence around the topics of these notes, thus improving health literacy. The model was trained on public web data that was algorithmically converted to appear as doctors’ handwriting using an innovative algorithm called Web-scale Reverse Substitution (WBRS). The researchers encountered a major problem while evaluating model performance on actual clinical notes – the inability to expand certain abbreviations. They solved this issue through a novel inference-chaining method called Elicitive Inference.

New experiments continued to be conducted with ChatGPT. For instance, NYU researchers explored the feasibility of leveraging ChatGPT for medical advice by taking the responses through a Turing Test. They concluded that ChatGPT could be possibly used for low-risk health questions. While this is still early-stage research, and much more work is needed to prove this observation in a more definitive manner, this is an important step toward leveraging a major AI innovation to improve human lives, particularly in low-income and developing nations. In another experiment, a Wharton School professor concluded that ChatGPT would score B and B- on the final examination of a regular MBA program. Microsoft announced BioGPT for bio-medical text generation and mining

Anthropic released a close beta of its Claude LLM, which it positioned as a rival to ChatGPT. DeepMind announced DreamerV3, its Reinforcement Learning-based algorithm for Minecraft, which it claims can be leveraged across a wide range of domains with fixed hyperparameters (i.e., without the need for additional tuning). Meta announced a new approach for 3D reconstruction called Multiview Compressive Coding.

Academia has not been far behind.  For instance, researchers at Berkeley released InstructPix2Pix, a textual prompt-based image-editing application. Researchers at Columbia and Wisconsin-Madison (in partnership with Microsoft) introduced GLIGEN (Grounded-Language-to-Image Generation), a novel approach to enhance the functionality of pre-trained text-to-image diffusion models. HyperReel, a novel method for rendering high-fidelity videos from multiple angles, was introduced by researchers from Carnegie Mellon, Meta & others.

Share this article.