Keynote Speakers
Piotr Indyk
Professor of Electrical Engineering and Computer Science at MIT
Title: Graph-based algorithms for similarity search: challenges and opportunities
Abstract:
Over the last few years, graph-based approaches to nearest neighbor search have gained renewed interest. Algorithms such as HNSW, NSG, and DiskANN have become popular tools in practice. These algorithms are highly versatile and come equipped with efficient implementations. At the same time, the theoretical guarantees of these algorithms are relatively limited. For instance, it has been observed (Indyk, Xu '23) that there exist simple low-dimensional datasets for which most of these algorithms exhibit query times that scale linearly with the dataset size. In this talk, I will discuss some of the challenges and opportunities presented by this class of algorithms.
Bio:
Piotr Indyk is the Thomas D. and Virginia W. Cabot Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, where he has been on the faculty since 2000. He graduated from the University of Warsaw in 1995 and received his Ph.D. in Computer Science from Stanford University in 2001. He received the Packard Fellowship in 2003 and the Simons Investigator Award in 2013. He is also a co-winner of the 2012 Paris Kanellakis Theory and Practice Award for his work on Locality-Sensitive Hashing. Piotr Indyk is a fellow of the Association for Computing Machinery and a member of the American Academy of Arts and Sciences and the National Academy of Sciences.
Bradley C. Love
Professor of Cognitive and Decision Sciences at University College London
Title: Embeddings of and for the mind
Abstract:
A variety of domains, including images, text, consumer choice, and brain activity, can be captured in embedding spaces. In this talk, I will consider how to collect very large semantic embeddings from human similarity judgments. We can use these embeddings to evaluate how well deep learning models align with humans. One conclusion is that models that tend to be better from an engineering standpoint are worse as models of humans. Should unconstrained embeddings or embeddings constrained to be non-negative should be preferred? I'll suggest there is a tradeoff in which non-negative embeddings, which tend to be more interpretable, are worse for encoding information but better for decoding. In the second part of the talk, I will consider whether agents can learn in an unsupervised manner by aligning embeddings across modalities (e.g., images and words) by respecting parallel similarity relations across domains. I'll present evidence that adults spontaneously align modalities during learning and that children rapidly learn new words through this alignment process.
Bio:
Bradley C. Love is a Professor of Cognitive and Decision Sciences in Experimental Psychology at University College London (UCL). He is also a distinguished fellow at The Alan Turing Institute for data science and AI, as well as the European Lab for Learning \& Intelligent Systems (ELLIS). His research lab is dedicated to advancing the understanding of human learning and decision-making by integrating behavioral, computational, and neuroscience perspectives. Currently, his team is pioneering efforts in large-scale modeling of brain and behavior using deep learning techniques. Additionally, they are developing BrainGPT, an innovative tool designed to assist neuroscience researchers by leveraging large language models.
Sanjiv Kumar
Google Fellow and VP at Google Research
Title: Towards massive scale similarity search
Abstract:
Beyond the traditional search and recommendation tasks in massive datasets, the recent revolution in deep learning and Large Language models (LLMs) is driving huge interest in efficient search in large databases. Fast search is proving to be essential for new applications like Retrieval-Augmented Generative AI (RAG) in addition to enabling efficient variants of LLMs. Most of these applications need fast techniques for Maximum Inner Product Search (MIPS). In this talk, I will describe the design of new techniques that effectively combine data partitioning with compression to achieve the state-of-the-art MIPS search. We have open-sourced the resulting system (ScaNN) which is used extensively by the external community. I will conclude with open questions in the area of similarity search and a discussion on other emerging alternatives such as generative retrieval based on LLMs.
Bio:
Sanjiv Kumar is a Google Fellow and VP at Google Research, where he is leading a team on theory and applications of large ML Foundational Models and Generative AI. His recent research interests include rethinking existing modeling and compute paradigms in LLMs with a focus on developing alternative techniques that allow fast training and inference. He also leads development of massive scale similarity search techniques, which are widely adopted in Google and the open-source community. He has published more than 125 papers and holds 60+ patents in the area of ML and Computer Vision. His work on convergence of Adam received the best paper award in ICLR, 2018. He is an action editor of JMLR and holds a PhD from the School of Computer Science at Carnegie Mellon University. More information can be found at: http://www.sanjivk.com.