AI Innovator Series

Distributed Training on AWS and Amazon SageMaker

July 27, 2022

Emily Webber, Amazon Web Services

Wish you could train your own large transformer models, but lack the GPUs and distribution experience to get started? Come learn about how AWS is making the next generation of large models more accessible than ever with our fully-managed machine learning service, Amazon SageMaker. We’ll learn about distribution basics like data parallel, then enhance it with multi-GPU sharding techniques such as model and tensor parallelism. Then you’ll learn how to optimize for AWS with tools like EFA, internode primitives with SageMaker data parallel, containers, hyperparameters and instances.

Emily Webber is a principal machine learning specialist solutions architect at Amazon Web Services. She has assisted hundreds of customers on their journey to machine learning in the cloud, notably JPMC, United Airlines, Aurora, Autodesk, and others, and specializes in distributed training for large vision and language models. She has mentored 60+ machine learning solution architects, suggested 150+ feature designs for SageMaker and AWS, and supports the Amazon SageMaker product and engineering teams on best practices for machine learning and customers. Emily is widely known in the AWS community for a 16-video YouTube series featuring Amazon SageMaker with 160K+ views, presentations at re:Invent, Twitch demos for SageMaker Fridays, along with a Keynote at O’Reilly AI London 2019 on a novel reinforcement learning approach she developed for public policy. She is currently authoring a 2023 book on distributing training.

Role of Machine Learning in Privacy and Security

July 20, 2022

Primal Wijesekera, University of California

Current smartphone operating systems regulate application permissions by prompting users on an ask-on-first-use basis. Prior research has shown that this method is ineffective because it fails to account for context: the circumstances under which an application first requests access to data may be vastly different from the circumstances under which it subsequently requests access. We performed a longitudinal 131-person field study to analyze the contextuality behind user privacy decisions to regulate access to sensitive resources. We built a classifier to make privacy decisions on the user's behalf by detecting when context has changed and, when necessary, inferring privacy preferences based on the user's past decisions and behavior. Our goal is to automatically grant appropriate resource requests without further user intervention, deny inappropriate requests, and only prompt the user when the system is uncertain of the user's preferences. We show that our approach can accurately predict users' privacy decisions 96.8% of the time, which is a four-fold reduction in error rate compared to current systems.

Primal Wijesekera is a staff research scientist in the Usable Security and Privacy Research group at ICSI, a University of California, Berkeley affiliated research institute. His research focuses on exposing current privacy vulnerabilities and providing systematic solutions to meet consumers' privacy expectations. He has extensive experience in mobile app analysis for privacy violations and implementing privacy protections for Android. He has published in top-tier security and usable security and privacy venues. He received his Ph.D. from the University of British Columbia, although he carried out his Ph.D. research at U.C. Berkeley. He also has a master’s from UBC in distributed systems and a BSc in computer science from the University of Colombo, Sri Lanka.

Towards Open World Visual Understanding with Neuro Symbolic Reasoning

July 13, 2022

Sathyanarayanan (Sathya) Aakur, Oklahoma State University

Deep learning models for multimodal understanding have taken great strides in task such as action recognition and action localization. However, there appears to be an implicit closed world assumption in these approaches, i.e., they assume that all observed data is composed of a static, known set of objects (nouns), actions (verbs), and activities (noun+verb combination) that are in 1:1 correspondence with the vocabulary from the training data. One must account for every eventuality when training these systems to ensure their performance in real-world environments. In this talk, I will discuss recent efforts to build open world visual understanding models that leverage the general purpose knowledge embedded in large scale knowledge bases for providing supervision using a neuro symbolic framework based on Grenenader’s Pattern Theory formalism.

Sathyanarayanan (Sathya) Aakur is currently an assistant professor with the Department of Computer Science at Oklahoma State University since 2019. He received his Ph.D. in computer science and engineering from the University of South Florida, Tampa, in 2019. His research interests include multimodal event understanding, commonsense reasoning for open world visual understanding, and deep learning applications for genomics. He is the 2022 recipient of the National Science Foundation CAREER award.

Large-Scale Language Models: Challenges and Opportunities

July 6, 2022 | Watch on YouTube

Chandra Bhagavatula, Allen Institute for AI

Large language models have taken natural language processing (NLP) by storm. In spite of impressive performance on a wide-range of NLP tasks, they often fail in surprising ways, especially when commonsense knowledge and reasoning are required. In this talk, I will present our ongoing work on modeling commonsense knowledge in language. First, I'll talk about how machines can author a large-scale, high-quality resource of generic knowledge, e.g., "birds can fly," or "bicycles have two wheels." Then, I'll present an experimental framework for commonsense morality; towards enabling neural language models to reason that "helping a friend" is good, but "helping a friend spread fake news" is not. I'll conclude by presenting some avenues of future work.

Chandra Bhagavatula is a Senior Research Scientist at the Allen Institute for AI (AI2) in the Mosaic team. His research focuses on a wide range of problems involving commonsense in AI -- e.g. modeling commonsense knowledge, reasoning, building benchmark datasets, evaluation etc. He is a co-recipient of the AAAI Outstanding paper award in 2020. He received his Ph.D. from Northwestern University in 2016.

Modeling Multi-Platform Information Diffusion in Social Media: Data-Driven Observations

June 29, 2022 | Watch on YouTube

Adriana Lamnitchi, Maastricht University

Accurately modeling information diffusion within and across social media platforms has many practical applications, such as estimating the size of the audience exposed to a particular narrative or testing intervention techniques for addressing misinformation. However, it turns out that real data reveal phenomena that pose significant challenges to modeling: events in the physical world affect in varying ways conversations on different social media platforms; coordinated influence campaigns may swing discussions in unexpected directions; a platform’s algorithms direct who sees which message. This talk will discuss challenges in modeling social medial activity in various contexts, from political crises to coordinated disinformation campaigns.

Adriana Lamnitchi is professor, chair of Computational Social Sciences in the Institute of Data Science at Maastricht University. Until 2021 she was professor in the Department of Computer Science and Engineering at University of South Florida. Her research lies at the intersection of distributed systems and social computing, with current focus on empirical analysis of phenomena in online social environments and designing solutions for modeling them. The National Science Foundation, Office for Naval Research, and DARPA have funded her work. She holds a Ph.D. in computer science from the University of Chicago, is an ACM Distinguished Member, IEEE Senior Member, and recipient of the National Science Foundation CAREER award.

Stop Chasing Transformational, Just Solve Problems: Opportunities for Machine Learning in the Atmospheric Sciences

June 15, 2022 | Watch on YouTube

Joseph Hardin, ClimateAI

Machine learning has raised exciting new possibilities across the entire range of atmospheric science problems. It has the potential to improve traditionally intractable problems and drastically speed up improvements in forecasting allowing us a greater understanding of everything from microphysical processes to climate systems. Accordingly funding agencies, including DOE, have started dedicating large amounts of money into machine learning (ML) research in these fields demanding “transformational” results. However, the story of ML itself is not about chasing transformational results, but rather an iterative improvement towards the next challenge that cumulatively resulted in transformational impact, but not always in predictable directions. In this talk, I’ll argue that planning for transformational results is the wrong approach, and we should instead be targeting low hanging fruit through building cross-domain expertise and collaboration. We will support this argument by covering where the current ML-based advances are coming from in the atmospheric sciences, how they connect to applications to the public good, and lay out potentially some of the easiest wins for ML in the short term. Finally, we will discuss what some of the bigger bottlenecks are and how both science domain and ML experts can collaborate on these issues.

Joseph Hardin is a senior data scientist and engineering manager at ClimateAI focusing on ML research into seasonal to subseasonal forecasting. He originally did his Ph.D. at Colorado State University in electrical engineering focusing on inverse problems in weather radar. He spent the following six years at Pacific Northwest National Laboratory as a computational scientist and radar engineer working on topics ranging from ARM radar deployments, large scale data processing and retrievals, ML applications to observations and modeling, and contributed to cross directorate ML organization teams. He joined ClimateAI last year as a senior data scientist and engineering manager to lead a team applying ML to longer term forecasting ranging with direct applications to customer-centric problems.

Training Large Language Models: Challenges and Opportunities

June 1, 2022

Mostofa Patwary, NVIDIA

Large language models trained on massive datasets can achieve state-of-the-art accuracies in various natural language processing applications including summarization, automatic dialogue generation, translation, semantic search, and code autocompletion. However, training such models is challenging as these models no longer fit in the largest GPU memory and can require a very long training time. Therefore, numerous innovations and breakthroughs are required in dataset, algorithms, software, and hardware altogether to make training these models a reality. In this talk, I will share our experience on training the Megatron-Turing Natural Language Generation model (MT-NLG), one of the most powerful monolithic transformer language models trained to date, with 530 billion parameters. I will also showcase several applications of MT-NLG and discuss future research and numerous opportunities that this model presents.

Mostofa Patwary is a principal research scientist at the Applied Deep Learning Research team at NVIDIA. Mostofa’s research interests span in the areas of natural language processing, scalable deep learning, high-performance computing, and algorithm engineering. Prior to joining NVIDIA, Mostofa worked on scaling large language models and predictability of scaling deep learning applications at Baidu’s Silicon Valley AI Lab. Mostofa also made significant contributions in developing large scale code for several core kernels in machine learning capable of running on the supercomputers while he worked in the Parallel Computing Lab at Intel Research.

On the Theory of Foundation Models and their Application to Privacy-preserving NLP

May 4, 2022 | Watch on YouTube

Tatsu Hashimoto, Stanford University

Large, pre-trained foundation models have transformed NLP with their impressive empirical performance. Despite the clear success of these models, their inductive biases and limits remain poorly understood. In this talk, we shed light on the origins of why masked language models recover linguistic structures such as syntax and demonstrate that these same models can break a 'curse of dimensionality' for differential privacy, leading to a simple and practical approach to privacy-preserving NLP.

Tatsu Hashimoto is an assistant professor in the Computer Science Department at Stanford University. My research uses tools from statistics to make machine learning systems more robust and reliable—especially in challenging tasks involving natural language. The goal of his research is to use robustness and worst-case performance as a lens to understand and make progress on several fundamental challenges in machine learning and natural language processing. A few topics of recent interest are, long-tail behavior How can we ensure that a machine learning system won't fail catastrophically in the wild under changing conditions? Understanding a system which understands how to answer questions or generate text should also do so robustly out-of-domain. Fairness machine learning systems which rely on unreliable correlations can result in spurious and harmful predictions.

Pushing NLP to the Edge

April 20, 2022 | Watch on YouTube

Alexander “Sasha” Rush, Cornell Tech

Models for natural language processing (NLP) are growing simultaneously along many different axes. The rush to scale has led to models that are much bigger, slower, use more energy, and yet are increasingly in-demand for more tasks. This talk is about the modeling challenges of running NLP on the edge. How can NLP researchers help contribute methods to allow users to use NLP? In practice, this often means considering each of these axes and their interaction. I discuss recent work that has developed methods to scaling down NLP models targeting edge use-cases and collaborating with hardware researchers to test them in practice.

Alexander "Sasha" Rush is an associate professor at Cornell Tech. His group's research is in the intersection of natural language processing, deep learning, and structured prediction with applications in text generation and efficient inference. He contributes to several open-source projects in NLP and works part time at Hugging Face. He is on the board of ICLR and developed the MiniConf tool used to run ML/NLP virtual conferences. His work has received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award, and a Sloan Fellowship.

Creating Trustworthy AI/ML for Weather and Climate

April 13, 2022 | Watch on YouTube

Amy McGovern, University of Oklahoma

Artificial intelligence and machine learning (ML) within the atmospheric sciences have shown to be useful across a variety of applications and continues to grow rapidly. In this talk, I will discuss how we, The NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography, assist in creating trustworthy AI for a variety of weather phenomena. I will go in depth on two specific weather applications: autonomous detection of frontal boundaries and convective hazards (e.g., hail and tornadoes). Beyond discussing the ML tools being developed, I will discuss the concept of trust in AI from its foundations to how we have worked with operational forecasters to enhance trustworthy AI within meteorology.

Amy McGovern is a Lloyd G. and Joyce Austin Presidential Professor in the School of Computer Science and School of Meteorology at the University of Oklahoma. She is also the PI and Director of the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography. Her research focuses on developing trustworthy AI for a variety of high-impact weather hazards including tornadoes, hail, fire, and more.

Implementing Symbols and Rules with Neural Networks

April 6, 2022 | Watch on YouTube

Ellie Pavlick, Brown University

Many aspects of human language and reasoning are well explained in terms of symbols and rules. However, state-of-the-art computational models are based on large neural networks which lack explicit symbolic representations of the type frequently used in cognitive theories. One response has been the development of neuro-symbolic models which introduce explicit representations of symbols into neural network architectures or loss functions. In terms of Marr's levels of analysis, such approaches achieve symbolic reasoning at the computational level ("what the system does and why") by introducing symbols and rules at the implementation and algorithmic levels. In this talk, I will consider an alternative: can neural networks (without any explicit symbolic components) nonetheless implement symbolic reasoning at the computational level? I will describe several diagnostic tests of "symbolic" and "rule-governed" behavior and use these tests to analyze neural models of visual and language processing. Our results show that on many counts, neural models appear to encode symbol-like concepts (e.g., conceptual representations that are abstract, systematic, and modular), but not perfectly so. Analysis of the failure cases reveals that future work is needed on methodological tools for analyzing neural networks, as well as refinement of models of hybrid neuro-symbolic reasoning in humans, to determine whether neural networks' deviations from the symbolic paradigm are a feature or a bug.

Ellie Pavlick is an assistant professor of Computer Science at Brown University, where she leads the Language Understanding and Representation Lab, and a research scientist at Google. Her research focuses on building computational models of language that are inspired by and/or informative of language processing in humans. Currently, her lab is investigating the inner workings of neural networks to "reverse engineer" the conceptual structures and reasoning strategies that these models use, as well as exploring the role of grounded (non-linguistic) signals for word and concept learning. Ellie's work is supported by DARPA, IARPA, NSF, and Google.

Advances in ML for Cyber-Safety

March 30, 2022 | Watch on YouTube

SRIJAN KUMAR, GEORGIA INSTITUTE OF TECHNOLOGY

The safety, integrity, and well-being of users, communities, and platforms on web and social media is a critical, yet challenging task. In this talk, I will describe the cybersafety-focused machine learning methods, leveraging behavior modeling, graph analytics, and deep learning, that my group has developed to efficiently detect malicious users and bad content online. While developing models that are highly accurate is important, it is also crucial to ensure that the systems are trustworthy. Thus, I will describe my group's work on quantifying the reliability of cybersafety models against smart adversaries that are popularly used in practice, such as those in Facebook.

Srijan Kumar is an Assistant Professor at the College of Computing at Georgia Institute of Technology. He develops data science, machine learning, and AI solutions for the pressing challenges pertaining to the safety, integrity, and well-being of users, platforms, and communities in the cyber domain. He has pioneered the development of user models and network science tools to enhance the well-being and safety of users. His methods have been used in production at Flipkart (India’s largest e-commerce platform) and taught at graduate level courses worldwide. He has named to the Forbes 30 under 30 in Science 2022 and has received several awards including the Facebook Faculty Award, Adobe Faculty Award, ACM SIGKDD Doctoral Dissertation Award runner-up 2018, Larry S. Davis Doctoral Dissertation Award 2018, and 'best of' awards from WWW and ICDM. His research has been the subject of a documentary and covered in popular press, including CNN, The Wall Street Journal, Wired, and New York Magazine. He completed his postdoctoral training at Stanford University, received a Ph.D. in Computer Science from University of Maryland, College Park, and B.Tech. from Indian Institute of Technology, Kharagpur.

Integrating Theory and Subject Matter Expertise into Computational Methods for Social Systems

March 16, 2022 | Watch on YouTube

Asmeret Naugle, Sandia National Laboratories

To effectively solve national security problems, we need to understand potential causes, consequences, and solutions, most of which involve people and social systems. Computational social science methods are useful, but often require data that may not be available for key national security problems. In these cases, we can turn to theory and subject matter expertise on the focus application, and potentially combine that knowledge with computational methods to strengthen both approaches. This talk will discuss recent and current Sandia projects that combine theory and expert knowledge with a variety of computational methods, including social simulation, data analysis, and machine learning. These projects address topics including group dynamics, geopolitics, disinformation, and influence.

Asmeret Naugle is a computational social scientist in Sandia National Laboratories’ Machine Intelligence Department. She has expertise and experience in social systems modeling, expert elicitation, and assessment and validation of various modeling techniques. She leads research projects that combine theory and modeling to study national security and social science topics. Her recent and current research includes work on group dynamics, strategic communications, and detecting influence campaigns.

Toward Robust, Knowledge-Rich NLP

March 2, 2022 | Watch on YouTube

Hanna Hajishirzi, University of Washington

Enormous amounts of ever-changing knowledge are available online in diverse textual styles and diverse formats. Recent advances in deep learning algorithms and large-scale datasets are spurring progress in many natural language processing (NLP) tasks, including question answering. Nevertheless, these models cannot scale up when task-annotated training data are scarce. This talk presents my lab's work toward building general-purpose models in NLP and how to systematically evaluate them. First, I present a general model for two known tasks of question answering in English and multiple languages that are robust to small domain shifts. Then, I show a meta-training approach that can solve a variety of NLP tasks with only using a few examples and introduce a benchmark to evaluate cross-task generalization. Finally, I discuss neuro-symbolic approaches to address more complex tasks by eliciting knowledge from structured data and language models.

Hanna Hajishirzi is an Assistant Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington and a Senior Research Manager at the Allen Institute for AI. Her research spans different areas in NLP and AI, focusing on developing general-purpose machine learning algorithms that can solve many NLP tasks. Applications for these algorithms include question answering, representation learning, green AI, knowledge extraction, and conversational dialogue. Honors include the NSF CAREER Award, Sloan Fellowship, Allen Distinguished Investigator Award, Intel rising star award, best paper and honorable mention awards, and several industry research faculty awards. Hanna received her PhD from University of Illinois and spent a year as a postdoc at Disney Research and CMU.

Efficient DNN Training at Scale: from Algorithms to Hardware

February 16, 2022

Gennady Pekhimenko, University of Toronto

The recent popularity of deep neural networks (DNNs) has generated a lot of research interest in performing DNN-related computation efficiently. However, the primary focus of systems research is usually quite narrow and limited to inference (i.e., how to efficiently execute already trained models) and image classification networks as the primary benchmark for evaluation. In this talk, we will demonstrate a holistic approach to DNN training acceleration and scalability starting from the algorithm, to software and hardware optimizations, to special development and optimization tools.

Gennady Pekhimenko is an assistant professor at the University of Toronto, Computer Science and (by courtesy) Electrical & Chemical Engineering departments, where he is leading the EcoSystem group. Gennady is also a faculty member at Vector Institute and a CIFAR AI chair. Before joining the University of Toronto, he spent a year at Microsoft Research in the Systems Research group. He got his PhD from the Computer Science department at Carnegie Mellon University. Gennady is a recipient of the Amazon Machine Learning Research Award, Facebook Faculty Research Award, and Connaught New Researcher Award, as well as NVIDIA Graduate, Microsoft Research, Qualcomm Innovation, and NSERC CGS-D fellowships. His research interests are in the areas of systems, computer architecture, compilers, and systems for machine learning.

Virtual Teams in Gig Economy: An End-to-End Data Science Approach

February 9, 2022

Wei Ai, University of Maryland

The gig economy provides workers with the benefits of autonomy and flexibility, but it does so at the expense of work identity and co-worker bonds. Among the many reasons why gig workers leave their platforms, an unexplored aspect is the organization identity.

In a series of studies, we develop a team formation and inter-team contest at a ride-sharing platform. We employ an end-to-end data science approach, combining methodologies from randomized field experiments, recommender systems, and counterfactual machine learning. Together, our results show that platform designers can leverage team identity and team contests to increase revenue and worker engagement in a gig economy.

Wei Ai is an Assistant Professor in the College of Information Studies (iSchool) and the Institute for Advanced Computer Studies (UMIACS) at the University of Maryland. His research interest lies in data science for social good, where the advances of machine learning and data analysis algorithms translate into measurable impacts on society. He combines machine learning, causal inference, and field experiments in his research, and has rich experience in collaborating with industrial partners. He earned his Ph.D. from the School of Information at the University of Michigan. His research has been published in top journals and conferences, including PNAS, ACM TOIS, WWW, and ICWSM.

Towards Language Generation We Can Trust

February 2, 2022 | Watch on YouTube

Yulia Tsvetkov, University of Washington

Modern language generation models produce highly fluent but often unreliable outputs. This motivated a surge of metrics attempting to measure the factual consistency of generated texts, and a surge of approaches to controlling various attributes of the text that models generate. However, existing metrics treat factuality as a binary concept and fail to provide deeper insights on the kinds of inconsistencies made by different systems. Similarly, most approaches to controllable text generation are focused on coarse-grained categorical attributes (typically only one attribute).

To address these concerns, we propose to focus on understanding finer-grained aspects of factuality and controlling for finer-grained aspects of the generated texts. In the first part of the talk, I will present a benchmark for evaluating the factual consistency of generated summaries against a nuanced typology of factual errors. In the second part of the talk, I will present an algorithm for controllable inference from pretrained models, which aims at rewriting model outputs with multiple sentence-level fine-grained constraints. Together, these approaches make strides towards more reliable applications of conditional language generation, such as summarization and machine translation.

Yulia Tsvetkov is an assistant professor at the Paul G. Allen School of Computer Science & Engineering at University of Washington. Her research group works on NLP for social good, multilingual NLP, and language generation. The projects are motivated by a unified goal: to extend the capabilities of human language technology beyond individual populations and across language boundaries, thereby enabling NLP for diverse and disadvantaged users, the users that need it most. Prior to joining UW, Yulia was an assistant professor at Carnegie Mellon University and a postdoc at Stanford. Yulia is a recipient of the Okawa research award, Amazon machine learning research award, Google faculty research award, and multiple NSF awards.

MEWS: An AI-centered Misinformation Early Warning System

December 15, 2021 | Watch on YouTube

Tim Weninger, University of Notre Dame

One of the most challenging aspects of online mis/disinformation is the overwhelming volume of content that is published on social media platforms every day. There is growing concern that online mis/disinformation campaigns may negatively impact the processes which sustain democratic government. The ability to rapidly identify disinformation campaigns as they are propagating through social media networks is thus an important aspect of limiting the negative effects of these campaigns.

For more than two years, a specialized team consisting of computer scientists, sociologists, and peace studies scholars have been working to develop the MEWS (misinformation early warning system) system, which collects social media image and video data in real time, analyzes those media for alterations, manipulations, and coordinated image and text narratives, and provides a warning system that alerts stakeholders if a coordinated social media influence campaigns are ongoing.

Tim Weninger is the Frank M. Friedmann Collegiate Associate Professor of Engineering at the University of Notre Dame. His research in social media and artificial intelligence looks to better understand how humans create and consume information, especially in online social systems.

Event and Relation Extraction with Less Supervision

December 1, 2021

Nanyun (Violet) Peng, University of California, Los Angeles

Events are central to human interaction with the world, yet automatic event and relation extraction is challenging because it requires the understanding of the interactions between the event triggers and the arguments usually span multiple sentences. The complex event argument structures also create challenges for collecting abundant of human annotations to train the model. In this talk, I will discuss how we design deep structured models that compile the problem structures into constraints and combine with deep neural networks for event extraction. We will also introduce our recent work on leveraging generative language models for zero-shot and low-shot event extraction from English and cross-lingual.

Nanyun (Violet) Peng is an assistant professor of computer science at the University of California, Los Angeles. Prior to that, she spent three years at the University of Southern California's Information Sciences Institute as an Assistant Research Professor. She received her PhD in computer science from Johns Hopkins University, Center for Language and Speech Processing. Her research focuses on robustness and generalizability of NLP models, with applications to natural language generation and low-resource information extraction. Her research has been supported by multiple DARPA/IARPA/NIH and industrial research awards.

WebQA: Multihop and Multimodal QA

November 3, 2021 | Watch on YouTube

Yonatan Bisk, Carnegie Mellon University

Web search is fundamentally multimodal and multihop. Often, even before asking a question we choose to go directly to image search to find our answers. Further, rarely do we find an answer from a single source but aggregate information and reason through implications. Despite the frequency of this everyday occurrence, at present, there is no unified question answering benchmark that requires a single model to answer long-form natural language questions from text and open-ended visual sources—akin to a human's experience. We propose to bridge this gap between the natural language and computer vision communities with WebQA. We show that our multihop text queries are difficult for a large-scale transformer model, and existing multi-modal transformers and visual representations do not perform well on open-domain visual queries. Our challenge for the community is to create a unified multimodal reasoning model that seamlessly transitions and reasons regardless of the source modality.

Yonatan Bisk is an assistant professor of computer science in Carnegie Mellon's Language Technologies Institute. His group works on grounded and embodied natural language processing, placing perception and interaction as central to how language is learned and understood. This was a windy path, having received his PhD from the University of Illinois at Urbana-Champaign working on unsupervised Bayesian models of syntax he subsequently spent several years as a postdoc and/or visitor at the University of Southern California's Information Science Institute (grounding), the University of Washington (commonsense), and Microsoft Research (vision + language).

PNNL

Mega AI