Skip to main content
Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

By Machine Learning Street Talk
Welcome! We at MLST are inspired by scientists and each week we have a hard-hitting discussion with the leading thinkers in the AI space. Street Talk is ridiculously technical and we believe strongly in diversity of thought in AI, covering all the main ideas in the field, avoiding hype where possible.

MLST is run by Dr. Tim Scarfe and Dr. Keith Duggar, and with regular appearances from Dr. Yannic Kilcher.
Where to listen
Apple Podcasts Logo

Apple Podcasts

Google Podcasts Logo

Google Podcasts

Overcast Logo


Pocket Casts Logo

Pocket Casts

RadioPublic Logo


Spotify Logo


#77 - Vitaliy Chiley (Cerebras)
Vitaliy Chiley  is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware. [00:00:00] Housekeeping [00:01:08] Preamble [00:01:50] Vitaliy Chiley Introduction [00:03:11] Cerebrus architecture [00:08:12] Memory management and FLOP utilisation [00:18:01] Centralised vs decentralised compute architecture [00:21:12] Sparsity [00:23:47] Does Sparse NN imply Heterogeneous compute? [00:29:21] Cost of distributed memory stores? [00:31:01] Activation vs weight sparsity [00:37:52] What constitutes a dead weight to be pruned? [00:39:02] Is it still a saving if we have to choose between weight and activation sparsity? [00:41:02] Cerebras is a cool place to work [00:44:05] What is sparsity? Why do we need to start dense?  [00:46:36] Evolutionary algorithms on Cerebras? [00:47:57] How can we start sparse? Google RIGL [00:51:44] Inductive priors, why do we need them if we can start sparse? [00:56:02] Why anthropomorphise inductive priors? [01:02:13] Could Cerebras run a cyclic computational graph? [01:03:16] Are NNs locality sensitive hashing tables? References; Rigging the Lottery: Making All Tickets Winners [RIGL] [D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet  A Spline Theory of Deep Learning [Balestriero] 
June 16, 2022
#76 - LUKAS BIEWALD (Weights and Biases CEO)
Check out Weights and Biases here! Lukas Biewald is an entrepreneur living in San Francisco. He was the founder and CEO of Figure Eight an Internet company that collects training data for machine learning.  In 2018, he founded Weights and Biases, a company that creates developer tools for machine learning. Recently WandB got a cash injection of 15 million dollars in its second funding round.  Lukas has a bachelors and masters in mathematics and computer science respectively from Stanford university.  He was a research student under the tutelage of the legendary Daphne Koller.  Lukas Biewald [00:00:00] Preamble [00:01:27] Intro to Lukas [00:02:46] How did Lukas build 2 sucessful startups? [00:05:49] Rebalancing games with ML [00:08:14] Elevator pitch for WandB [00:10:38] Science vs Engineering divide in ML DevOps [00:14:11] Too much focus on the minutiae? [00:18:03] Vertical information sharing in large enterprises (metrics) [00:20:37] Centralised vs Decentralised topology [00:24:02] Generalisation vs specialisation [00:28:59] Enhancing explainability [00:33:14] Should we try and understand "the machine" or is testing / behaviourism enough? [00:36:55] WandB roadmap [00:39:06] WandB / ML Ops competitor space? [00:44:10] How is WandB differentiated over Sagemaker / AzureML [00:46:02] WandB Sponsorship of ML YT channels [00:48:43] Alternatives to deep learning? [00:53:47] How to build a business like WandB Panel: Tim Scarfe Ph.D and Keith Duggar Ph.D Note we didn't get paid by Weights and Biases to conduct this interview.
June 09, 2022
#75 - Emergence [Special Edition] with Dr. DANIELE GRATTAROLA
An emergent behavior or emergent property can appear when a number of simple entities operate in an environment, forming more complex behaviours as a collective. If emergence happens over disparate size scales, then the reason is usually a causal relation across different scales. Weak emergence describes new properties arising in systems as a result of the low-level interactions, these might be interactions between components of the system or the components and their environment.  In our epic introduction we focus a lot on the concept of self-organisation, complex systems, cellular automata and strong vs weak emergence. In the main show we discuss this more in detail with Dr. Daniele Grattarola and cover his recent NeurIPS paper on learning graph cellular automata.  YT version: Patreon: Discord: Featuring; Dr. Daniele Grattarola Dr. Tim Scarfe Dr. Keith Duggar Prof. David Chalmers Prof. Ken Stanley Prof. Julian Togelius Dr. Joscha Bach David Ha Dr. Pei Wang [00:00:00] Special Edition Intro: Emergence and Cellular Automata [00:49:02] Intro to Daniele and CAs [00:57:23] Numerical analysis link with CA (PDEs) [00:59:50] The representational dichotomy of discrete and continuous at different scales [01:05:21] Universal computation in CAs [01:10:27] Computational irreducibility  [01:16:33] Is the universe discrete? [01:20:49] Emergence but with the same computational principle [01:23:10] How do you formalise the emergent phenomenon  [01:25:44] Growing cellular automata [01:33:53] Openeded and unbounded computation is required for this kind of behaviour [01:37:31] Graph cellula automata [01:43:40] Connection to protein folding [01:46:24] Are CAs the best tool for the job? [01:49:37] Where to go to find more information
April 29, 2022
#74 Dr. ANDREW LAMPINEN - Symbolic behaviour in AI [UNPLUGGED]
Please note that in this interview Dr. Lampinen was expressing his personal opinions and they do not necessarily represent those of DeepMind.  Patreon: Discord: YT version:  Dr. Andrew Lampinen is a Senior Research Scientist at DeepMind, and he thinks that symbols are subjective in the relativistic sense. Dr. Lampinen completed his PhD in Cognitive Psychology at Stanford University. His background is in mathematics, physics, and machine learning. Andrew has said that his research interests are in cognitive flexibililty and generalization, and how these abilities are enabled by factors like language, memory, and embodiment.  Andrew with his coauthors has just released a paper called symbolic behaviour in artificial intelligence. Andrew lead in the paper by saying the human ability to use symbols has yet to be replicated in machines. He thinks that one of the key areas to bridge the gap here is considering how symbol meaning is established, and he strongly believes it is the symbol users themselves who agree upon the symbol meaning, And that the use of symbols entails behaviours which coalesce agreements about their meaning. Which in plain English means that symbols are defined by behaviours rather than their content. [00:00:00] Intro to Andrew and Symbolic Behaviour paper [00:07:01] Semantics underpins the unreasonable effectiveness of symbols [00:12:56] The Depth of Subjectivity [00:21:03] Walid Saba - universal cognitive templates [00:27:47] Insufficiently Darwinian  [00:30:52] Discovered vs invented [00:34:19] Does language have primacy [00:35:59] Research directions [00:39:43] Comparison to BenG OpenCog and human compatible AI [00:42:53] Aligning AI with our culture [00:47:55] Do we need to model the worst aspects of human behaviour?  [00:50:57] Fairness [00:54:24] Memorisatation on LLMs [01:00:38] Wason selection task [01:03:45] Would an Andrew hashtable robot be intelligent? Dr. Andrew Lampinen Symbolic Behaviour in Artificial Intelligence Imitating Interactive Intelligence Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Yasaman Razeghi] Big bench dataset Teaching Autoregressive Language Models Complex Tasks By Demonstration [Recchia] Wason selection task Gary Lupyan
April 14, 2022
#73 - YASAMAN RAZEGHI & Prof. SAMEER SINGH - NLP benchmarks
Patreon: Discord: YT version: This week we speak with Yasaman Razeghi and Prof. Sameer Singh from UC Urvine. Yasaman recently published a paper called Impact of Pretraining Term Frequencies on Few-Shot Reasoning where she demonstrated comprehensively that large language models only perform well on reasoning tasks because they memorise the dataset. For the first time she showed the accuracy was linearly correlated to the occurance rate in the training corpus, something which OpenAI should have done in the first place!  We also speak with Sameer who has been a pioneering force in the area of machine learning interpretability for many years now, he created LIME with Marco Riberio and also had his hands all over the famous Checklist paper and many others.  We also get into the metric obsession in the NLP world and whether metrics are one of the principle reasons why we are failing to make any progress in NLU.  [00:00:00] Impact of Pretraining Term Frequencies on Few-Shot Reasoning [00:14:59] Metrics [00:18:55] Definition of reasoning [00:25:12] Metrics (again) [00:28:52] On true believers  [00:33:04] Sameers work on model explainability / LIME  [00:36:58] Computational irreducability  [00:41:07] ML DevOps and Checklist [00:45:58] Future of ML devops [00:49:34] Thinking about future Prof. Sameer Singh Yasaman Razeghi References; Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Razeghi et al with Singh] Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Riberio et al with Singh] “Why Should I Trust You?” Explaining the Predictions of Any Classifier (LIME) [Riberio et al with Singh] Tim interviewing LIME Creator Marco Ribeiro in 2019 Tim video on LIME/SHAP on his other channel Our interview with Christoph Molar Interpretable Machine Learning book @ChristophMolnar Machine Teaching: A New Paradigm for Building Machine Learning Systems [Simard] Whimsical notes on machine teaching Gopher paper (Deepmind) EleutherAI A Theory of Universal Artificial Intelligence based on Algorithmic Complexity [Hutter]
April 07, 2022
#72 Prof. KEN STANLEY 2.0 - On Art and Subjectivity [UNPLUGGED]
YT version: Patreon:  Discord: Prof. Ken Stanley argued in his book that our world has become saturated with objectives. The process of setting an objective, attempting to achieve it, and measuring progress along the way has become the primary route to achievement in our culture. He’s not saying that objectives are bad per se, especially if they’re modest, but he thinks that when goals are ambitious then the search space becomes deceptive. Is the key to artificial intelligence really related to intelligence? Does taking a job with a higher salary really bring you closer to being a millionaire? The problem is that the stepping stones which lead to ambitious objectives tend to be pretty strange, they don't resemble the final end state at all. Vaccum tubes led to computers for example and Youtube started as a dating website.  What fascinated us about this conversation with Ken is that we got a much deeper understanding of his philosophy. He lead by saying that he thought it's worth questioning whether artificial intelligence is even a science or not. Ken thinks that the secret to future progress is for us to embrace more subjectivity.  [00:00:00] Tim Intro [00:12:54] Intro [00:17:08] Seeing ideas everywhere - AI and art are highly connected [00:28:40] Creativity in Mathematics [00:30:14] Where is the intelligence in art? [00:38:49] Is AI disappointingly simple to mechanise? [00:42:48] Slightly conscious [00:46:27] Do we have subjective experience? [00:50:23] Fear of the unknown [00:51:48] Free Will [00:54:22] Chalmers [00:55:08] What's happening now in open-endedness [00:58:31] Generalisation [01:06:34] Representation primitives and what it means to understand [01:12:37] Appeal to definitions, knowledge itself blocks discovery Make sure you buy Kenneth's book! Why Greatness Cannot Be Planned: The Myth of the Objective [Stanley, Lehman] Abandoning Objectives: Evolution through the Search for Novelty Alone [Lehman, Stanley] Twitter
March 29, 2022
#71 - ZAK JOST (Graph Neural Networks + Geometric DL) [UNPLUGGED]
Special discount link for Zak's GNN course - Patreon: Discord: YT version: (there are lots of helper graphics there, recommended if poss) Want to sponsor MLST!? Let us know on Linkedin / Twitter.  [00:00:00] Preamble [00:03:12] Geometric deep learning [00:10:04] Message passing [00:20:42] Top down vs bottom up [00:24:59] All NN architectures are different forms of information diffusion processes (squashing and smoothing problem) [00:29:51] Graph rewiring [00:31:38] Back to information diffusion  [00:42:43] Transformers vs GNNs [00:47:10] Equivariant subgraph aggregation networks + WL test [00:55:36] Do equivariant layers aggregate too? [00:57:49] Zak's GNN course Exhaustive list of references on the YT show URL (
March 25, 2022
#70 - LETITIA PARCALABESCU - Symbolics, Linguistics [UNPLUGGED]
Today we are having a discussion with Letitia Parcalabescu from the AI Coffee Break youtube channel! We discuss linguistics, symbolic AI and our respective Youtube channels. Make sure you subscribe to her channel! In the first 15 minutes Tim dissects the recent article from Gary Marcus "Deep learning has hit a wall".  Patreon: Discord: YT: [00:00:00] Comments on Gary Marcus Article / Symbolic AI [00:14:57] Greetings [00:17:40] Introduction [00:18:48] A shared journey towards computation [00:22:10] A linguistics outsider [00:24:11] Is computational linguistics AI? [00:28:23] swinging pendulums of dogma and resource allocation [00:31:16] the road less travelled [00:34:35] pitching grants with multimodality ... and then the truth [00:40:50] some aspects of language are statistically learnable [00:44:58] ... and some aspects of language are dimensionally cursed [00:48:24] it's good to have both approaches to machine intelligence [00:51:14] the world runs on symbols [00:54:28] there is much more to learn biology [00:59:26] Letitia's creation process [01:02:23] don't overfit content, instead publish and iterate [01:07:48] merging the big picture arrow from the small direction arrows [01:11:02] use passion to drive through failure to success [01:12:56] stay positive [01:16:02] closing remarks
March 19, 2022
#69 DR. THOMAS LUX - Interpolation of Sparse High-Dimensional Data
Today we are speaking with Dr. Thomas Lux, a research scientist at Meta in Silicon Valley.  In some sense, all of supervised machine learning can be framed through the lens of geometry. All training data exists as points in euclidean space, and we want to predict the value of a function at all those points. Neural networks appear to be the modus operandi these days for many domains of prediction. In that light; we might ask ourselves — what makes neural networks better than classical techniques like K nearest neighbour from a geometric perspective. Our guest today has done research on exactly that problem, trying to define error bounds for approximations in terms of directions, distances, and derivatives.   The insights from Thomas's work point at why neural networks are so good at problems which everything else fails at, like image recognition. The key is in their ability to ignore parts of the input space, do nonlinear dimension reduction, and concentrate their approximation power on important parts of the function.  [00:00:00] Intro to Show [00:04:11] Intro to Thomas (Main show kick off) [00:04:56] Interpolation of Sparse High-Dimensional Data [00:12:19] Where does one place the basis functions to partition the space, the perennial question [00:16:20] The sampling phenomenon -- where did all those dimensions come from? [00:17:40] The placement of the MLP basis functions, they are not where you think they are [00:23:15] NNs only extrapolate when given explicit priors to do so, CNNs in the translation domain [00:25:31] Transformers extrapolate in the permutation domain [00:28:26] NN priors work by creating space junk everywhere [00:36:44] Are vector spaces the way to go? On discrete problems [00:40:23] Activation functioms [00:45:57] What can we prove about NNs? Gradients without backprop Interpolation of Sparse High-Dimensional Data [Lux] A Spline Theory of Deep Learning [_Balestriero_] Gradients without Backpropagation ‘22
March 12, 2022
#68 DR. WALID SABA 2.0 - Natural Language Understanding [UNPLUGGED]
Patreon: Discord: YT version: Dr. Walid Saba is an old-school polymath. He has a background in cognitive  psychology, linguistics, philosophy, computer science and logic and he’s is now a Senior Scientist at Sorcero. Walid is perhaps the most outspoken critic of BERTOLOGY, which is to say trying to solve the problem of natural language understanding with application of large statistical language models. Walid thinks this approach is cursed to failure because it’s analogous to memorising infinity with a large hashtable. Walid thinks that the various appeals to infinity by some deep learning researchers are risible. [00:00:00] MLST Housekeeping [00:08:03] Dr. Walid Saba Intro [00:11:56] AI Cannot Ignore Symbolic Logic, and Here’s Why [00:23:39] Main show - Proposition: Statistical learning doesn't work [01:04:44] Discovering a sorting algorithm bottom-up is hard [01:17:36] The axioms of nature (universal cognitive templates) [01:31:06] MLPs are locality sensitive hashing tables References; The Missing Text Phenomenon, Again: the case of Compound Nominals A Spline Theory of Deep Networks The Defeat of the Winograd Schema Challenge Impact of Pretraining Term Frequencies on Few-Shot Reasoning AI Cannot Ignore Symbolic Logic, and Here’s Why Learnability can be undecidable Scaling Language Models: Methods, Analysis & Insights from Training Gopher DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning On the Measure of Intelligence [Chollet] A Formal Theory of Commonsense Psychology: How People Think People Think Continuum hypothesis Gödel numbering + completness theorems Concepts: Where Cognitive Science Went Wrong [Jerry A. Fodor]
March 07, 2022
#67 Prof. KARL FRISTON 2.0
We engage in a bit of epistemic foraging with Prof. Karl Friston! In this show; we discuss the free energy principle in detail, also emergence, cognition, consciousness and Karl's burden of knowledge! YT: Patreon: Discord: [00:00:00] Introduction to FEP/Friston [00:06:53] Cheers to Epistemic Foraging! [00:09:17] The Burden of Knowledge Across Disciplines [00:12:55] On-show introduction to Friston [00:14:23] Simple does NOT mean Easy [00:21:25] Searching for a Mathematics of Cognition [00:26:44] The Low Road and The High Road to the Principle [00:28:27] What's changed for the FEP in the last year [00:39:36] FEP as stochastic systems with a pullback attractor [00:44:03] An attracting set at multiple time scales and time infinity [00:53:56] What about fuzzy Markov boundaries? [00:59:17] Is reality densely or sparsely coupled? [01:07:00] Is a Strong and Weak Emergence distinction useful? [01:13:25] a Philosopher, a Zombie, and a Sentient Consciousness walk into a bar ...  [01:24:28] Can we recreate consciousness in silico? Will it have qualia? [01:28:29] Subjectivity and building hypotheses [01:34:17] Subject specific realizations to minimize free energy [01:37:21] Free will in a deterministic Universe The free energy principle made simpler but not too simple
March 02, 2022
#66 ALEXANDER MATTICK - [Unplugged / Community Edition]
We have a chat with Alexander Mattick aka ZickZack from Yannic's Discord community. Alex is one of the leading voices in that community and has an impressive technical depth. Don't forget MLST has now started it's own Discord server too, come and join us! We are going to run regular events, our first big event on Wednesday 9th 1700-1900 UK time.  Patreon: Discord: YT version: [00:00:00] Introduction to Alex  [00:02:16] Spline theory of NNs  [00:05:19] Do NNs abstract?  [00:08:27] Tim's exposition of spline theory of NNs [00:11:11] Semantics in NNs  [00:13:37] Continuous vs discrete  [00:19:00] Open-ended Search [00:22:54] Inductive logic programming [00:25:00] Control to gain knowledge and knowledge to gain control [00:30:22] Being a generalist with a breadth of knowledge and knowledge transfer [00:36:29] Causality [00:43:14] Discrete program synthesis + theorem solvers
February 28, 2022
#65 Prof. PEDRO DOMINGOS [Unplugged]
Note: there are no politics discussed in this show and please do not interpret this show as any kind of a political statement from us.  We have decided not to discuss politics on MLST anymore due to its divisive nature.  Patreon: Discord: [00:00:00] Intro [00:01:36] What we all need to understand about machine learning [00:06:05] The Master Algorithm Target Audience [00:09:50] Deeply Connected Algorithms seen from Divergent Frames of Reference [00:12:49] There is a Master Algorithm; and it's mine! [00:14:59] The Tribe of Evolution [00:17:17] Biological Inspirations and Predictive Coding [00:22:09] Shoe-Horning Gradient Descent [00:27:12] Sparsity at Training Time vs Prediction Time [00:30:00] World Models and Predictive Coding [00:33:24] The Cartoons of System 1 and System 2 [00:40:37] AlphaGo Searching vs Learning [00:45:56] Discriminative Models evolve into Generative Models [00:50:36] Generative Models, Predictive Coding, GFlowNets [00:55:50] Sympathy for a Thousand Brains [00:59:05] A Spectrum of Tribes [01:04:29] Causal Structure and Modelling [01:09:39] Entropy and The Duality of Past vs Future, Knowledge vs Control [01:16:14] A Discrete Universe? [01:19:49] And yet continuous models work so well [01:23:31] Finding a Discretised Theory of Everything
February 26, 2022
#64 Prof. Gary Marcus 3.0
Patreon: Discord: YT: We have a chat with Prof. Gary Marcus about everything which is currently top of mind for him, consciousness  [00:00:00] Gary intro [00:01:25] Slightly conscious [00:24:59] Abstract, compositional models [00:32:46] Spline theory of NNs [00:36:17] Self driving cars / algebraic reasoning  [00:39:43] Extrapolation [00:44:15] Scaling laws [00:49:50] Maximum likelihood estimation References: Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets DEEP DOUBLE DESCENT: WHERE BIGGER MODELS AND MORE DATA HURT Bayesian Deep Learning and a Probabilistic Perspective of Generalization
February 24, 2022
#063 - Prof. YOSHUA BENGIO - GFlowNets, Consciousness & Causality
We are now sponsored by Weights and Biases! Please visit our sponsor link: Patreon: For Yoshua Bengio, GFlowNets are the most exciting thing on the horizon of Machine Learning today. He believes they can solve previously intractable problems and hold the key to unlocking machine abstract reasoning itself. This discussion explores the promise of GFlowNets and the personal journey Prof. Bengio traveled to reach them. Panel: Dr. Tim Scarfe Dr. Keith Duggar Dr. Yannic Kilcher Our special thanks to:  - Alexander Mattick (Zickzack) References: Yoshua Bengio @ MILA ( GFlowNet Foundations ( Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation ( Interpolation Consistency Training for Semi-Supervised Learning ( Towards Causal Representation Learning ( Causal inference using invariant prediction: identification and confidence intervals (
February 22, 2022
#062 - Dr. Guy Emerson - Linguistics, Distributional Semantics
Dr. Guy Emerson is a computational linguist and obtained his Ph.D from Cambridge university where he is now a research fellow and lecturer. On panel we also have myself, Dr. Tim Scarfe, as well as Dr. Keith Duggar and the veritable Dr. Walid Saba. We dive into distributional semantics, probability theory, fuzzy logic, grounding, vagueness and the grammar/cognition connection. The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words from a body of text. The twin challenges are: how do we represent meaning, and how do we learn these representations? We want to learn the meanings of words from a corpus by exploiting the fact that the context of a word tells us something about its meaning. This is known as the distributional hypothesis. In his Ph.D thesis, Dr. Guy Emerson presented a distributional model which can learn truth-conditional semantics which are grounded by objects in the real world. Hope you enjoy the show! Patreon:
February 03, 2022
061: Interpolation, Extrapolation and Linearisation (Prof. Yann LeCun, Dr. Randall Balestriero)
We are now sponsored by Weights and Biases! Please visit our sponsor link: Patreon: Yann LeCun thinks that it's specious to say neural network models are interpolating because in high dimensions, everything is extrapolation. Recently Dr. Randall Balestriero, Dr. Jerome Pesente and prof. Yann LeCun released their paper learning in high dimensions always amounts to extrapolation. This discussion has completely changed how we think about neural networks and their behaviour. [00:00:00] Pre-intro [00:11:58] Intro Part 1: On linearisation in NNs [00:28:17] Intro Part 2: On interpolation in NNs [00:47:45] Intro Part 3: On the curse [00:48:19] LeCun [01:40:51] Randall B YouTube version:
January 04, 2022
#60 Geometric Deep Learning Blueprint (Special Edition)
Patreon: The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact tractable given enough computational horsepower. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning and second, learning by local gradient-descent type methods, typically implemented as backpropagation. While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not uniform and have strong repeating patterns as a result of the low-dimensionality and structure of the physical world. Geometric Deep Learning unifies a broad class of ML problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases. This week we spoke with Professor Michael Bronstein (head of graph ML at Twitter) and Dr. Petar Veličković (Senior Research Scientist at DeepMind), and Dr. Taco Cohen and Prof. Joan Bruna about their new proto-book Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. See the table of contents for this (long) show at 
September 19, 2021
#59 - Jeff Hawkins (Thousand Brains Theory)
Patreon: The ultimate goal of neuroscience is to learn how the human brain gives rise to human intelligence and what it means to be intelligent. Understanding how the brain works is considered one of humanity’s greatest challenges.  Jeff Hawkins thinks that the reality we perceive is a kind of simulation, a hallucination, a confabulation. He thinks that our brains are a model reality based on thousands of information streams originating from the sensors in our body.  Critically - Hawkins doesn’t think there is just one model but rather; thousands.  Jeff has just released his new book, A thousand brains: a new theory of intelligence. It’s an inspiring and well-written book and I hope after watching this show; you will be inspired to read it too. Panel: Dr. Keith Duggar Connor Leahy
September 03, 2021
#58 Dr. Ben Goertzel - Artificial General Intelligence
The field of Artificial Intelligence was founded in the mid 1950s with the aim of constructing “thinking machines” - that is to say, computer systems with human-like general intelligence. Think of humanoid robots that not only look but act and think with intelligence equal to and ultimately greater than that of human beings. But in the intervening years, the field has drifted far from its ambitious old-fashioned roots. Dr. Ben Goertzel is an artificial intelligence researcher, CEO and founder of SingularityNET. A project combining artificial intelligence and blockchain to democratize access to artificial intelligence. Ben seeks to fulfil the original ambitions of the field.  Ben graduated with a PhD in Mathematics from Temple University in 1990. Ben’s approach to AGI over many decades now has been inspired by many disciplines, but in particular from human cognitive psychology and computer science perspective. To date Ben’s work has been mostly theoretically-driven. Ben thinks that most of the deep learning approaches to AGI today try to model the brain. They may have a loose analogy to human neuroscience but they have not tried to derive the details of an AGI architecture from an overall conception of what a mind is. Ben thinks that what matters for creating human-level (or greater) intelligence is having the right information processing architecture, not the underlying mechanics via which the architecture is implemented. Ben thinks that there is a certain set of key cognitive processes and interactions that AGI systems must implement explicitly such as; working and long-term memory, deliberative and reactive processing, perc biological systems tend to be messy, complex and integrative; searching for a single “algorithm of general intelligence” is an inappropriate attempt to project the aesthetics of physics or theoretical computer science into a qualitatively different domain. TOC is on the YT show description Panel: Dr. Tim Scarfe, Dr. Yannic Kilcher, Dr. Keith Duggar Artificial General Intelligence: Concept, State of the Art, and Future Prospects The General Theory of General Intelligence: A Pragmatic Patternist Perspective
August 11, 2021
#57 - Prof. Melanie Mitchell - Why AI is harder than we think
Since its beginning in the 1950s, the field of artificial intelligence has vacillated between periods of optimistic predictions and massive investment and periods of disappointment, loss of confidence, and reduced funding. Even with today’s seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected.  Professor Melanie Mitchell thinks one reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself. YT vid- Main show kick off [00:26:51] Panel: Dr. Tim Scarfe, Dr. Keith Duggar, Letitia Parcalabescu (
July 25, 2021
#56 - Dr. Walid Saba, Gadi Singer, Prof. J. Mark Bishop (Panel discussion)
It has been over three decades since the statistical revolution overtook AI by a storm and over two  decades since deep learning (DL) helped usher the latest resurgence of artificial intelligence (AI). However, the disappointing progress in conversational agents, NLU, and self-driving cars, has made it clear that progress has not lived up to the promise of these empirical and data-driven methods. DARPA has suggested that it is time for a third wave in AI, one that would be characterized by hybrid models – models that combine knowledge-based approaches with data-driven machine learning techniques.  Joining us on this panel discussion is polymath and linguist Walid Saba - Co-founder ONTOLOGIK.AI, Gadi Singer - VP & Director, Cognitive Computing Research, Intel Labs and J. Mark Bishop - Professor of Cognitive Computing (Emeritus), Goldsmiths, University of London and Scientific Adviser to FACT360. Moderated by Dr. Keith Duggar and Dr. Tim Scarfe #machinelearning #artificialintelligence
July 08, 2021
#55 Self-Supervised Vision Models (Dr. Ishan Misra - FAIR).
Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. He finished his PhD at the Robotics Institute at Carnegie Mellon. He has done stints at Microsoft Research, INRIA and Yale. His bachelors is in computer science where he achieved the highest GPA in his cohort.  Ishan is fast becoming a prolific scientist, already with more than 3000 citations under his belt and co-authoring with Yann LeCun; the godfather of deep learning.  Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Many of you will remember PIRL: Self-Supervised Learning of Pretext-Invariant Representations which Ishan was the primary author of in 2019. References; Shuffle and Learn - DepthContrast - DINO - Barlow Twins - SwAV - PIRL - AVID - (best paper candidate at CVPR'21 (just announced over the weekend) -   Alexei (Alyosha) Efros   Exemplar networks   The bitter lesson - Rich Sutton   Machine Teaching: A New Paradigm for Building Machine Learning Systems   POET
June 21, 2021
#54 Gary Marcus and Luis Lamb - Neurosymbolic models
Professor Gary Marcus is a scientist, best-selling author, and entrepreneur. He is Founder and CEO of Robust.AI, and was Founder and CEO of Geometric Intelligence, a machine learning company acquired by Uber in 2016. Gary said in his recent next decade paper that — without us, or other creatures like us, the world would continue to exist, but it would not be described, distilled, or understood.  Human lives are filled with abstraction and causal description. This is so powerful. Francois Chollet the other week said that intelligence is literally sensitivity to abstract analogies, and that is all there is to it. It's almost as if one of the most important features of intelligence is to be able to abstract knowledge, this drives the generalisation which will allow you to mine previous experience to make sense of many future novel situations.   Also joining us today is Professor Luis Lamb — Secretary of Innovation for Science and Technology of the State of Rio Grande do Sul, Brazil. His Research Interests are Machine Learning and Reasoning, Neuro-Symbolic Computing, Logic in Computation and Artificial Intelligence, Cognitive and Neural Computation and also AI Ethics and Social Computing. Luis released his new paper Neurosymbolic AI: the third wave at the end of last year. It beautifully articulated the key ingredients needed in the next generation of AI systems, integrating type 1 and type 2 approaches to AI and it summarises all the of the achievements of the last 20 years of research.   We cover a lot of ground in today's show. Explaining the limitations of deep learning, Rich Sutton's the bitter lesson and "reward is enough", and the semantic foundation which is required for us to build robust AI.
June 04, 2021
#53 Quantum Natural Language Processing - Prof. Bob Coecke (Oxford)
Bob Coercke is a celebrated physicist, he's been a Physics and Quantum professor at Oxford University for the last 20 years. He is particularly interested in Structure which is to say, Logic, Order, and Category Theory. He is well known for work involving compositional distributional models of natural language meaning and he is also fascinated with understanding how our brains work. Bob was recently appointed as the Chief Scientist at Cambridge Quantum Computing. Bob thinks that interactions between systems in Quantum Mechanics carries naturally over to how word meanings interact in natural language. Bob argues that this interaction embodies the phenomenon of quantum teleportation. Bob invented ZX-calculus, a graphical calculus for revealing the compositional structure inside quantum circuits - to show entanglement states and protocols in a visually succinct but logically complete way. Von Neumann himself didn't even like his own original symbolic formalism of quantum theory, despite it being widely used! We hope you enjoy this fascinating conversation which might give you a lot of insight into natural language processing.  Tim Intro [00:00:00] The topological brain (Post-record button skit) [00:13:22] Show kick off [00:19:31] Bob introduction [00:22:37] Changing culture in universities [00:24:51] Machine Learning is like electricity [00:31:50] NLP -- what is Bob's Quantum conception? [00:34:50] The missing text problem [00:52:59] Can statistical induction be trusted? [00:59:49] On pragmatism and hybrid systems [01:04:42] Parlour tricks, parsing and information flows [01:07:43] How much human input is required with Bob's method? [01:11:29] Reality, meaning, structure and language [01:14:42] Replacing complexity with quantum entanglement, emergent complexity [01:17:45] Loading quantum data requires machine learning [01:19:49]  QC is happy math coincidence for NLP [01:22:30] The Theory of English (ToE) [01:28:23]  ... or can we learn the ToE? [01:29:56]  How did diagrammatic quantum calculus come about? [01:31:04 The state of quantum computing today [01:37:49]  NLP on QC might be doable even in the NISQ era [01:40:48]  Hype and private investment are driving progress [01:48:34]  Crypto discussion (moved to post-show) [01:50:38]  Kilcher is in a startup (moved to post show) [01:53:40 Debrief [01:55:26] 
May 19, 2021
#52 - Unadversarial Examples (Hadi Salman, MIT)
Performing reliably on unseen or shifting data distributions is a difficult challenge for modern vision systems, even slight corruptions or transformations of images are enough to slash the accuracy of state-of-the-art classifiers. When an adversary is allowed to modify an input image directly, models can be manipulated into predicting anything even when there is no perceptible change, this is known an adversarial example. The ideal definition of an adversarial example is when humans consistently say two pictures are the same but a machine disagrees. Hadi Salman, a Ph.D student at MIT (ex-Uber and Microsoft Research) started thinking about how adversarial robustness  could be leveraged beyond security. He realised that the phenomenon of adversarial examples could actually be turned upside down to lead to more robust models instead of breaking them. Hadi actually utilized the brittleness of neural networks to design unadversarial examples or robust objects which_ are objects designed specifically to be robustly recognized by neural networks.  Introduction [00:00:00] DR KILCHER'S PHD HAT [00:11:18] Main Introduction [00:11:38] Hadi's Introduction [00:14:43] More robust models == transfer better [00:46:41] Features not bugs paper [00:49:13] Manifolds [00:55:51] Robustness and Transferability [00:58:00] Do non-robust features generalize worse than robust? [00:59:52] The unreasonable predicament of entangled features [01:01:57] We can only find adversarial examples in the vicinity [01:09:30] Certifiability of models for robustness [01:13:55] Carlini is coming for you! And we are screwed [01:23:21] Distribution shift and corruptions are a bigger problem than adversarial examples [01:25:34] All roads lead to generalization [01:26:47] Unadversarial examples [01:27:26]
May 01, 2021
#51 Francois Chollet - Intelligence and Generalisation
In today's show we are joined by Francois Chollet, I have been inspired by Francois ever since I read his Deep Learning with Python book and started using the Keras library which he invented many, many years ago. Francois has a clarity of thought that I've never seen in any other human being! He has extremely interesting views on intelligence as generalisation, abstraction and an information conversation ratio. He wrote on the measure of intelligence at the end of 2019 and it had a huge impact on my thinking. He thinks that NNs can only model continuous problems, which have a smooth learnable manifold and that many "type 2" problems which involve reasoning and/or planning are not suitable for NNs. He thinks that many problems have type 1 and type 2 enmeshed together. He thinks that the future of AI must include program synthesis to allow us to generalise broadly from a few examples, but the search could be guided by neural networks because the search space is interpolative to some extent. Tim's Whimsical notes;
April 16, 2021
#50 Christian Szegedy - Formal Reasoning, Program Synthesis
Dr. Christian Szegedy from Google Research is a deep learning heavyweight. He invented adversarial examples, one of the first object detection algorithms, the inceptionnet architecture, and co-invented batchnorm. He thinks that if you bet on computers and software in 1990 you would have been as right as if you bet on AI now. But he thinks that we have been programming computers the same way since the 1950s and there has been a huge stagnation ever since. Mathematics is the process of taking a fuzzy thought and formalising it. But could we automate that? Could we create a system which will act like a super human mathematician but you can talk to it in natural language? This is what Christian calls autoformalisation. Christian thinks that automating many of the things we do in mathematics is the first step towards software synthesis and building human-level AGI. Mathematics ability is the litmus test for general reasoning ability. Christian has a fascinating take on transformers too.  With Yannic Lightspeed Kilcher and Dr. Mathew Salvaris Whimsical Canvas with Tim's Notes: YouTube version (with detailed table of contents)
April 04, 2021
#49 - Meta-Gradients in RL - Dr. Tom Zahavy (DeepMind)
The race is on, we are on a collective mission to understand and create artificial general intelligence. Dr. Tom Zahavy, a Research Scientist at DeepMind thinks that reinforcement learning is the most general learning framework that we have today, and in his opinion it could lead to artificial general intelligence. He thinks there are no tasks which could not be solved by simply maximising a reward.  Back in 2012 when Tom was an undergraduate, before the deep learning revolution he attended an online lecture on how CNNs automatically discover representations. This was an epiphany for Tom. He decided in that very moment that he was going to become an ML researcher.  Tom's view is that the ability to recognise patterns and discover structure is the most important aspect of intelligence. This has been his quest ever since. He is particularly focused on using diversity preservation and metagradients to discover this structure.  In this discussion we dive deep into meta gradients in reinforcement learning.  Video version and TOC @
March 23, 2021
#48 Machine Learning Security - Andy Smith
First episode in a series we are doing on ML DevOps. Starting with the thing which nobody seems to be talking about enough, security! We chat with cyber security expert Andy Smith about threat modelling and trust boundaries for an ML DevOps system.  Intro [00:00:00] ML DevOps - a security perspective [00:00:50] Threat Modelling [00:03:03] Adversarial examples? [00:11:27] Nobody understands the whole stack [00:13:53] On the size of the state space, the element of unpredictability [00:18:32] Threat modelling in more detail [00:21:17] Trust boundaries for an ML DevOps system [00:25:45] Andy has a YouTube channel on cyber security! Check it out @ Video version:
March 16, 2021
047 Interpretable Machine Learning - Christoph Molnar
Christoph Molnar is one of the main people to know in the space of interpretable ML. In 2018 he released the first version of his incredible online book, interpretable machine learning. Interpretability is often a deciding factor when a machine learning (ML) model is used in a product, a decision process, or in research. Interpretability methods can be used to discover knowledge, to debug or justify the model and its predictions, and to control and improve the model, reason about potential bias in models as well as increase the social acceptance of models. But Interpretability methods can also be quite esoteric, add an additional layer of complexity and potential pitfalls and requires expert knowledge to understand. Is it even possible to understand complex models or even humans for that matter in any  meaningful way?  Introduction to IML [00:00:00] Show Kickoff [00:13:28] What makes a good explanation? [00:15:51] Quantification of how good an explanation is [00:19:59] Knowledge of the pitfalls of IML [00:22:14] Are linear models even interpretable? [00:24:26] Complex Math models to explain Complex Math models? [00:27:04] Saliency maps are glorified edge detectors [00:28:35] Challenge on IML -- feature dependence [00:36:46] Don't leap to using a complex model! Surrogate models can be too dumb [00:40:52] On airplane pilots. Seeking to understand vs testing [00:44:09] IML Could help us make better models or lead a better life [00:51:53] Lack of statistical rigor and quantification of uncertainty [00:55:35] On Causality [01:01:09] Broadening out the discussion to the process or institutional level [01:08:53] No focus on fairness / ethics? [01:11:44] Is it possible to condition ML model training on IML metrics ? [01:15:27] Where is IML going? Some of the esoterica of the IML methods [01:18:35] You can't compress information without common knowledge, the latter becomes the bottleneck [01:23:25] IML methods used non-interactively? Making IML an engineering discipline [01:31:10] Tim Postscript -- on the lack of effective corporate operating models for IML, security, engineering and ethics [01:36:34] Explanation in Artificial Intelligence: Insights from the Social Sciences (Tim Miller 2018) Seven Myths in Machine Learning Research (Chang 19)  Myth 7: Saliency maps are robust ways to interpret neural networks Sanity Checks for Saliency Maps (Adebayo 2020) Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Christoph Molnar: Please show your appreciation and buy Christoph's book here; Panel:  Connor Tann Dr. Tim Scarfe  Dr. Keith Duggar Video version:
March 14, 2021
#046 The Great ML Stagnation (Mark Saroufim and Dr. Mathew Salvaris)
Academics think of themselves as trailblazers, explorers — seekers of the truth. Any fundamental discovery involves a significant degree of risk. If an idea is guaranteed to work then it moves from the realm of research to engineering. Unfortunately, this also means that most research careers will invariably be failures at least if failures are measured via “objective” metrics like citations. Today we discuss the recent article from Mark Saroufim called Machine Learning: the great stagnation. We discuss the rise of gentleman scientists, fake rigor, incentives in ML, SOTA-chasing, "graduate student descent", distribution of talent in ML and how to learn effectively.   With special guest interviewer Mat Salvaris.  Machine learning: the great stagnation [00:00:00] Main show kick off [00:16:30] Great stagnation article / Bad incentive systems in academia [00:18:24] OpenAI is a media business [00:19:48] Incentive structures in academia [00:22:13] SOTA chasing [00:24:47] F You Money [00:28:53] Research grants and gentlemen scientists [00:29:13] Following your own gradient of interest and making a contribution [00:33:27] Marketing yourself to be successful [00:37:07] Tech companies create the bad incentives [00:42:20] GPT3 was sota chasing but it seemed really... "good"? Scaling laws? [00:51:09] Dota / game AI [00:58:39] Hard to go it alone? [01:02:08] Reaching out to people [01:09:21] Willingness to be wrong [01:13:14] Distribution of talent / tech interviews [01:18:30] What should you read online and how to learn? Sharing your stuff online and finding your niece [01:25:52] Mark Saroufim: Dr. Mathew Salvaris:
March 06, 2021
#045 Microsoft's Platform for Reinforcement Learning (Bonsai)
Microsoft has an interesting strategy with their new “autonomous systems” technology also known as Project Bonsai. They want to create an interface to abstract away the complexity and esoterica of deep reinforcement learning. They want to fuse together expert knowledge and artificial intelligence all on one platform, so that complex problems can be decomposed into simpler ones. They want to take machine learning Ph.Ds out of the equation and make autonomous systems engineering look more like a traditional software engineering process. It is an ambitious undertaking, but interesting. Reinforcement learning is extremely difficult (as I cover in the video), and if you don’t have a team of RL Ph.Ds with tech industry experience, you shouldn’t even consider doing it yourself. This is our take on it! There are 3 chapters in this video; Chapter 1: Tim's intro and take on RL being hard, intro to Bonsai and machine teaching  Chapter 2: Interview with Scott Stanfield [recorded Jan 2020] 00:56:41 Chapter 3: Traditional street talk episode [recorded Dec 2020] 01:38:13 This is *not* an official communication from Microsoft, all personal opinions. There is no MS-confidential information in this video.  With: Scott Stanfield Megan Bloemsma Gurdeep Pall (he has not validated anything we have said in this video or been involved in the creation of it) Panel:  Dr. Keith Duggar Dr. Tim Scarfe Yannic Kilcher
February 28, 2021
#044 - Data-efficient Image Transformers (Hugo Touvron)
Today we are going to talk about the *Data-efficient image Transformers paper or (DeiT) which Hugo is the primary author of. One of the recipes of success for vision models since the DL revolution began has been the availability of large training sets. CNNs have been optimized for almost a decade now, including through extensive architecture search which is prone to overfitting. Motivated by the success of transformers-based models in Natural Language Processing there has been increasing attention in applying these approaches to vision models. Hugo and his collaborators used a different training strategy and a new distillation token to get a massive increase in sample efficiency with image transformers.  00:00:00 Introduction 00:06:33 Data augmentation is all you need 00:09:53 Now the image patches are the convolutions though? 00:12:16 Where are those inductive biases hiding? 00:15:46 Distillation token 00:21:01 Why different resolutions on training 00:24:14 How data efficient can we get? 00:26:47 Out of domain generalisation 00:28:22 Why are transformers data efficient at all? Learning invariances 00:32:04 Is data augmentation cheating? 00:33:25 Distillation strategies - matching the intermediatae teacher representation as well as output 00:35:49 Do ML models learn the same thing for a problem? 00:39:01 How is it like at Facebook AI? 00:41:17 How long is the PhD programme? 00:42:03 Other interests outside of transformers? 00:43:18 Transformers for Vision and Language 00:47:40 Could we improve transformers models? (Hybrid models) 00:49:03 Biggest challenges in AI? 00:50:52 How far can we go with data driven approach?
February 25, 2021
#043 Prof J. Mark Bishop - Artificial Intelligence Is Stupid and Causal Reasoning won't fix it.
Professor Mark Bishop does not think that computers can be conscious or have phenomenological states of consciousness unless we are willing to accept panpsychism which is idea that mentality is fundamental and ubiquitous in the natural world, or put simply, that your goldfish and everything else for that matter has a mind. Panpsychism postulates that distinctions between intelligences are largely arbitrary. Mark’s work in the ‘philosophy of AI’ led to an influential critique of computational approaches to Artificial Intelligence through a thorough examination of John Searle's 'Chinese Room Argument' Mark just published a paper called artificial intelligence is stupid and causal reasoning wont fix it. He makes it clear in this paper that in his opinion computers will never be able to compute everything, understand anything, or feel anything.  00:00:00​ Tim Intro 00:15:04​ Intro  00:18:49​ Introduction to Marks ideas  00:25:49​ Some problems are not computable  00:29:57​ the dancing was Pixies fallacy  00:32:36​ The observer relative problem, and its all in the mapping  00:43:03​ Conscious Experience  00:53:30​ Intelligence without representation, consciousness is something that we do  01:02:36​ Consciousness helps us to act autonomously  01:05:13​ The Chinese room argument  01:14:58​ Simulation argument and computation doesn't have phenomenal consciousness  01:17:44​ Language informs our colour perception  01:23:11​ We have our own distinct ontologies  01:27:12​ Kurt Gödel, Turing and Penrose and the implications of their work 
February 19, 2021
#042 - Pedro Domingos - Ethics and Cancel Culture
Today we have professor Pedro Domingos and we are going to talk about activism in machine learning, cancel culture, AI ethics and kernels. In Pedro's book the master algorithm, he segmented the AI community into 5 distinct tribes with 5 unique identities (and before you ask, no the irony of an anti-identitarian doing do was not lost on us!). Pedro recently published an article in Quillette called Beating Back Cancel Culture: A Case Study from the Field of Artificial Intelligence. Domingos has railed against political activism in the machine learning community and cancel culture. Recently Pedro was involved in a controversy where he asserted the NeurIPS broader impact statements are an ideological filter mechanism. Important Disclaimer: All views expressed are personal opinions. 00:00:00 Caveating 00:04:08 Main intro 00:07:44 Cancelling culture is a culture and intellectual weakness  00:12:26 Is cancel culture a post-modern religion?  00:24:46 Should we have gateways and gatekeepers?  00:29:30 Does everything require broader impact statements?  00:33:55 We are stifling diversity (of thought) not promoting it.  00:39:09 What is fair and how to do fair? 00:45:11 Models can introduce biases by compressing away minority data  00:48:36 Accurate but unequal soap dispensers  00:53:55 Agendas are not even self-consistent  00:56:42 Is vs Ought: all variables should be used for Is  01:00:38 Fighting back cancellation with cancellation? 01:10:01 Intent and degree matter in right vs wrong.  01:11:08 Limiting principles matter  01:15:10 Gradient descent and kernels  01:20:16 Training Journey matter more than Destination  01:24:36 Can training paths teach us about symmetry? 01:28:37 What is the most promising path to AGI?  01:31:29 Intelligence will lose its mystery
February 11, 2021
#041 - Biologically Plausible Neural Networks - Dr. Simon Stringer
Dr. Simon Stringer. Obtained his Ph.D in mathematical state space control theory and has been a Senior Research Fellow at Oxford University for over 27 years. Simon is the director of the the Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, which is based within the Oxford University Department of Experimental Psychology. His department covers vision, spatial processing, motor function, language and consciousness -- in particular -- how the primate visual system learns to make sense of complex natural scenes. Dr. Stringers laboratory houses a team of theoreticians, who are developing computer models of a range of different aspects of brain function. Simon's lab is investigating the neural and synaptic dynamics that underpin brain function. An important matter here is the The feature-binding problem which concerns how the visual system represents the hierarchical relationships between features. the visual system must represent hierarchical binding relations across the entire visual field at every spatial scale and level in the hierarchy of visual primitives. We discuss the emergence of self-organised behaviour, complex information processing, invariant sensory representations and hierarchical feature binding which emerges when you build biologically plausible neural networks with temporal spiking dynamics.  00:00:09 Tim Intro  00:09:31 Show kickoff  00:14:37 Hierarchical Feature binding and timing of action potentials  00:30:16 Hebb to Spike-timing-dependent plasticity (STDP)  00:35:27 Encoding of shape primitives  00:38:50 Is imagination working in the same place in the brain  00:41:12 Compare to supervised CNNs  00:45:59 Speech recognition, motor system, learning mazes  00:49:28 How practical are these spiking NNs  00:50:19 Why simulate the human brain  00:52:46 How much computational power do you gain from differential timings  00:55:08 Adversarial inputs  00:59:41 Generative / causal component needed?  01:01:46 Modalities of processing i.e. language  01:03:42 Understanding  01:04:37 Human hardware  01:06:19 Roadmap of NNs?  01:10:36 Intepretability methods for these new models  01:13:03 Won't GPT just scale and do this anyway?  01:15:51 What about trace learning and transformation learning  01:18:50 Categories of invariance  01:19:47 Biological plausibility
February 03, 2021
#040 - Adversarial Examples (Dr. Nicholas Carlini, Dr. Wieland Brendel, Florian Tramèr)
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. there's good reason to believe neural networks look at very different features than we would have expected.  As articulated in the 2019 "features not bugs" paper Adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans.  Adversarial examples don't just affect deep learning models. A cottage industry has sprung up around Threat Modeling in AI and ML Systems and their dependencies. Joining us this evening are some of currently leading researchers in adversarial examples; Florian Tramèr - A fifth year PhD student in Computer Science at Stanford University​​ Dr. Wieland Brendel - Machine Learning Researcher at the University of Tübingen & Co-Founder of​​ Dr. Nicholas Carlini - Research scientist at Google Brain working in that exciting space between machine learning and computer security.​ We really hope you enjoy the conversation, remember to subscribe!  Yannic Intro [00:00:00​] Tim Intro [00:04:07​] Threat Taxonomy [00:09:00​]  Main show intro [00:11:30​] Whats wrong with Neural Networks? [00:14:52​] The role of memorization [00:19:51​] Anthropomorphization of models [00:22:42​] Whats the harm really though / focusing on actual ML security risks [00:27:03​] Shortcut learning / OOD generalization [00:36:18​] Human generalization [00:40:11​] An existential problem in DL getting the models to learn what we want? [00:41:39​] Defenses to adversarial examples [00:47:15​] What if we had all the data and the labels? Still problems? [00:54:28​] Defenses are easily broken [01:00:24​] Self deception in academia [01:06:46​] ML Security [01:28:15​]
January 31, 2021
#039 - Lena Voita - NLP
ena Voita is a Ph.D. student at the University of Edinburgh and University of Amsterdam. Previously, She was a research scientist at Yandex Research and worked closely with the Yandex Translate team. She still teaches NLP at the Yandex School of Data Analysis. She has created an exciting new NLP course on her website which you folks need to check out! She has one of the most well presented blogs we have ever seen, where she discusses her research in an easily digestable manner. Lena has been investigating many fascinating topics in machine learning and NLP. Today we are going to talk about three of her papers and corresponding blog articles; Source and Target Contributions to NMT Predictions -- Where she talks about the influential dichotomy between the source and the prefix of neural translation models. Information-Theoretic Probing with MDL -- Where Lena proposes a technique of evaluating a model using the minimum description length or Kolmogorov complexity of labels given representations rather than something basic like accuracy Evolution of Representations in the Transformer - Lena investigates the evolution of representations of individual tokens in Transformers -- trained with different training objectives (MT, LM, MLM) Panel Dr. Tim Scarfe, Yannic Kilcher, Sayak Paul 00:00:00 Kenneth Stanley / Greatness can not be planned house keeping 00:21:09 Kilcher intro 00:28:54 Hello Lena 00:29:21 Tim - Lenas NMT paper 00:35:26 Tim - Minimum Description Length / Probe paper 00:40:12 Tim - Evolution of representations 00:46:40 Lenas NLP course 00:49:18 The peppermint tea situation  00:49:28 Main Show Kick Off  00:50:22 Hallucination vs exposure bias  00:53:04 Lenas focus on explaining the models not SOTA chasing 00:56:34 Probes paper and NLP intepretability 01:02:18 Why standard probing doesnt work 01:12:12 Evolutions of representations paper 01:23:53 BERTScore  and BERT Rediscovers the Classical NLP Pipeline paper 01:25:10 Is the shifting encoding context because of BERT bidirectionality 01:26:43 Objective defines which information we lose on input 01:27:59 How influential is the dataset? 01:29:42 Where is the community going wrong? 01:31:55 Thoughts on GOFAI/Understanding in NLP? 01:36:38 Lena's NLP course  01:47:40 How to foster better learning / understanding 01:52:17 Lena's toolset and languages 01:54:12 Mathematics is all you need 01:56:03 Programming languages
January 23, 2021
#038 - Professor Kenneth Stanley - Why Greatness Cannot Be Planned
Professor Kenneth Stanley is currently a research science manager at OpenAI in San Fransisco. We've Been dreaming about getting Kenneth on the show since the very begininning of Machine Learning Street Talk. Some of you might recall that our first ever show was on the enhanced POET paper, of course Kenneth had his hands all over it. He's been cited over 16000 times, his most popular paper with over 3K citations was the NEAT algorithm. His interests are neuroevolution, open-endedness, NNs, artificial life, and AI. He invented the concept of novelty search with no clearly defined objective. His key idea is that there is a tyranny of objectives prevailing in every aspect of our lives, society and indeed our algorithms. Crucially, these objectives produce convergent behaviour and thinking and distract us from discovering stepping stones which will lead to greatness. He thinks that this monotonic objective obsession, this idea that we need to continue to improve benchmarks every year is dangerous. He wrote about this in detail in his recent book "greatness can not be planned" which will be the main topic of discussion in the show. We also cover his ideas on open endedness in machine learning.  00:00:00 Intro to Kenneth  00:01:16 Show structure disclaimer  00:04:16 Passionate discussion  00:06:26 WHy greatness cant be planned and the tyranny of objectives  00:14:40 Chinese Finger Trap   00:16:28 Perverse Incentives and feedback loops  00:18:17 Deception  00:23:29 Maze example  00:24:44 How can we define curiosity or interestingness  00:26:59 Open endedness  00:33:01 ICML 2019 and Yannic, POET, first MSLST  00:36:17 evolutionary algorithms++  00:43:18 POET, the first MLST   00:45:39 A lesson to GOFAI people  00:48:46 Machine Learning -- the great stagnation  00:54:34 Actual scientific successes are usually luck, and against the odds -- Biontech  00:56:21 Picbreeder and NEAT  01:10:47 How Tim applies these ideas to his life and why he runs MLST  01:14:58 Keith Skit about UCF  01:15:13 Main show kick off  01:18:02 Why does Kenneth value serindipitous exploration so much  01:24:10 Scientific support for Keneths ideas in normal life  01:27:12 We should drop objectives to achieve them. An oxymoron?  01:33:13 Isnt this just resource allocation between exploration and exploitation?  01:39:06 Are objectives merely a matter of degree?  01:42:38 How do we allocate funds for treasure hunting in society  01:47:34 A keen nose for what is interesting, and voting can be dangerous  01:53:00 Committees are the antithesis of innovation  01:56:21 Does Kenneth apply these ideas to his real life?  01:59:48 Divergence vs interestingness vs novelty vs complexity  02:08:13 Picbreeder  02:12:39 Isnt everything novel in some sense?  02:16:35 Imagine if there was no selection pressure?  02:18:31 Is innovation == environment exploitation?  02:20:37 Is it possible to take shortcuts if you already knew what the innovations were?  02:21:11 Go Explore -- does the algorithm encode the stepping stones?  02:24:41 What does it mean for things to be interestingly different?  02:26:11 behavioral characterization / diversity measure to your broad interests  02:30:54 Shaping objectives  02:32:49 Why do all ambitious objectives have deception? Picbreeder analogy  02:35:59 Exploration vs Exploitation, Science vs Engineering  02:43:18 Schools of thought in ML and could search lead to AGI  02:45:49 Official ending 
January 20, 2021
#037 - Tour De Bayesian with Connor Tann
Connor Tan is a physicist and senior data scientist working for a multinational energy company where he co-founded and leads a data science team. He holds a first-class degree in experimental and theoretical physics from Cambridge university. With a master's in particle astrophysics. He specializes in the application of machine learning models and Bayesian methods. Today we explore the history, pratical utility, and unique capabilities of Bayesian methods. We also discuss the computational difficulties inherent in Bayesian methods along with modern methods for approximate solutions such as Markov Chain Monte Carlo. Finally, we discuss how Bayesian optimization in the context of automl may one day put Data Scientists like Connor out of work. Panel: Dr. Keith Duggar, Alex Stenlake, Dr. Tim Scarfe 00:00:00 Duggars philisophical ramblings on Bayesianism 00:05:10 Introduction 00:07:30 small datasets and prior scientific knowledge 00:10:37 Bayesian methods are probability theory 00:14:00 Bayesian methods demand hard computations 00:15:46 uncertainty can matter more than estimators 00:19:29 updating or combining knowledge is a key feature 00:25:39 Frequency or Reasonable Expectation as the Primary Concept  00:30:02 Gambling and coin flips 00:37:32 Rev. Thomas Bayes's pool table 00:40:37 ignorance priors are beautiful yet hard 00:43:49 connections between common distributions 00:49:13 A curious Universe, Benford's Law 00:55:17 choosing priors, a tale of two factories 01:02:19 integration, the computational Achilles heel 01:35:25 Bayesian social context in the ML community 01:10:24 frequentist methods as a first approximation 01:13:13 driven to Bayesian methods by small sample size 01:18:46 Bayesian optimization with automl, a job killer? 01:25:28 different approaches to hyper-parameter optimization 01:30:18 advice for aspiring Bayesians 01:33:59 who would connor interview next? Connor Tann:
January 11, 2021
#036 - Max Welling: Quantum, Manifolds & Symmetries in ML
Today we had a fantastic conversation with Professor Max Welling, VP of Technology, Qualcomm Technologies Netherlands B.V.  Max is a strong believer in the power of data and computation and its relevance to artificial intelligence. There is a fundamental blank slate paradgm in machine learning, experience and data alone currently rule the roost. Max wants to build a house of domain knowledge on top of that blank slate. Max thinks there are no predictions without assumptions, no generalization without inductive bias. The bias-variance tradeoff tells us that we need to use additional human knowledge when data is insufficient. Max Welling has pioneered many of the most sophistocated inductive priors in DL models developed in recent years, allowing us to use Deep Learning with non-euclidean data i.e. on graphs/topology (a field we now called "geometric deep learning") or allowing network architectures to recognise new symmetries in the data for example gauge or SE(3) equivariance. Max has also brought many other concepts from his physics playbook into ML, for example quantum and even Bayesian approaches.  This is not an episode to miss, it might be our best yet!  Panel: Dr. Tim Scarfe, Yannic Kilcher, Alex Stenlake 00:00:00 Show introduction  00:04:37 Protein Fold from DeepMind -- did it use SE(3) transformer?  00:09:58 How has machine learning progressed  00:19:57 Quantum Deformed Neural Networks paper  00:22:54 Probabilistic Numeric Convolutional Neural Networks paper 00:27:04 Ilia Karmanov from Qualcomm interview mini segment 00:32:04 Main Show Intro  00:35:21 How is Max known in the community?  00:36:35 How Max nurtures talent, freedom and relationship is key  00:40:30 Selecting research directions and guidance  00:43:42 Priors vs experience (bias/variance trade-off)  00:48:47 Generative models and GPT-3  00:51:57 Bias/variance trade off -- when do priors hurt us  00:54:48 Capsule networks  01:03:09 Which old ideas whould we revive  01:04:36 Hardware lottery paper  01:07:50 Greatness can't be planned (Kenneth Stanley reference)  01:09:10 A new sort of peer review and originality  01:11:57 Quantum Computing  01:14:25 Quantum deformed neural networks paper  01:21:57 Probabalistic numeric convolutional neural networks  01:26:35 Matrix exponential  01:28:44 Other ideas from physics i.e. chaos, holography, renormalisation  01:34:25 Reddit  01:37:19 Open review system in ML  01:41:43 Outro 
January 03, 2021
#035 Christmas Community Edition!
Welcome to the Christmas special community edition of MLST! We discuss some recent and interesting papers from Pedro Domingos (are NNs kernel machines?), Deepmind (can NNs out-reason symbolic machines?), Anna Rodgers - When BERT Plays The Lottery, All Tickets Are Winning, Prof. Mark Bishop (even causal methods won't deliver understanding), We also cover our favourite bits from the recent Montreal AI event run by Prof. Gary Marcus (including Rich Sutton, Danny Kahneman and Christof Koch). We respond to a reader mail on Capsule networks. Then we do a deep dive into Type Theory and Lambda Calculus with community member Alex Mattick. In the final hour we discuss inductive priors and label information density with another one of our discord community members.   Panel: Dr. Tim Scarfe, Yannic Kilcher, Alex Stenlake, Dr. Keith Duggar Enjoy the show and don't forget to subscribe! 00:00:00 Welcome to Christmas Special!  00:00:44 SoTa meme  00:01:30 Happy Christmas!  00:03:11 Paper -- DeepMind - Outperforming neuro-symbolic models with NNs (Ding et al) 00:08:57 What does it mean to understand?  00:17:37 Paper - Prof. Mark Bishop Artificial Intelligence is stupid and causal reasoning wont fix it 00:25:39 Paper -- Pedro Domingos -  Every Model Learned by Gradient Descent Is Approximately a Kernel Machine 00:31:07 Paper - Bengio - Inductive Biases for Deep Learning of Higher-Level Cognition 00:32:54 Anna Rodgers - When BERT Plays The Lottery, All Tickets Are Winning 00:37:16 Montreal AI event - Gary Marcus on reasoning  00:40:37 Montreal AI event -- Rich Sutton on universal theory of AI 00:49:45 Montreal AI event -- Danny Kahneman, System 1 vs 2 and Generative Models ala free energy principle 01:02:57 Montreal AI event -- Christof Koch - Neuroscience is hard 01:10:55 Markus Carr -- reader letter on capsule networks 01:13:21 Alex response to Marcus Carr  01:22:06 Type theory segment --  with Alex Mattick from Discord 01:24:45 Type theory segment -- What is Type Theory  01:28:12 Type theory segment -- Difference between functional and OOP languages  01:29:03 Type theory segment -- Lambda calculus  01:30:46 Type theory segment -- Closures  01:35:05 Type theory segment -- Term rewriting (confluency and termination)  01:42:02 MType theory segment -- eta term rewritig system - Lambda Calculus   01:54:44 Type theory segment -- Types / semantics  02:06:26 Type theory segment -- Calculus of constructions  02:09:27 Type theory segment -- Homotopy type theory  02:11:02 Type theory segment -- Deep learning link  02:17:27 Jan from Discord segment -- Chrome MRU skit  02:18:56 Jan from Discord segment -- Inductive priors (with XMaster96/Jan from Discord)  02:37:59 Jan from Discord segment -- Label information density (with XMaster96/Jan from Discord)  02:55:13 Outro
December 27, 2020
#034 Eray Özkural- AGI, Simulations & Safety
Dr. Eray Ozkural is an AGI researcher from Turkey, he is the founder of Celestial Intellect Cybernetics. Eray is extremely critical of Max Tegmark, Nick Bostrom and MIRI founder Elizier Yodokovsky and their views on AI safety. Eray thinks that these views represent a form of neoludditism and they are capturing valuable research budgets with doomsday fear-mongering and effectively want to prevent AI from being developed by those they don't agree with. Eray is also sceptical of the intelligence explosion hypothesis and the argument from simulation. Panel -- Dr. Keith Duggar, Dr. Tim Scarfe, Yannic Kilcher 00:00:00 Show teaser intro with added nuggets and commentary 00:48:39 Main Show Introduction  00:53:14 Doomsaying to Control   00:56:39 Fear the Basilisk!   01:08:00 Intelligence Explosion Ethics   01:09:45 Fear the Automous Drone! ... or spam   01:11:25 Infinity Point Hypothesis   01:15:26 Meat Level Intelligence  01:21:25 Defining Intelligence ... Yet Again   01:27:34 We'll make brains and then shoot them  01:31:00 The Universe likes deep learning  01:33:16 NNs are glorified hash tables  01:38:44 Radical behaviorists   01:41:29 Omega Architecture, possible AGI?   01:53:33 Simulation hypothesis  02:09:44 No one cometh unto Simulation, but by Jesus Christ   02:16:47 Agendas, Motivations, and Mind Projections   02:23:38 A computable Universe of Bulk Automata  02:30:31 Self-Organized Post-Show Coda  02:31:29 Investigating Intelligent Agency is Science  02:36:56 Goodbye and cheers!
December 20, 2020
#033 Prof. Karl Friston - The Free Energy Principle
This week Dr. Tim Scarfe, Dr. Keith Duggar and Connor Leahy chat with Prof. Karl Friston. Professor Friston is a British neuroscientist at University College London and an authority on brain imaging. In 2016 he was ranked the most influential neuroscientist on Semantic Scholar.  His main contribution to theoretical neurobiology is the variational Free energy principle, also known as active inference in the Bayesian brain. The FEP is a formal statement that the existential imperative for any system which survives in the changing world can be cast as an inference problem. Bayesian Brain Hypothesis states that the brain is confronted with ambiguous sensory evidence, which it interprets by making inferences about the hidden states which caused the sensory data. So is the brain an inference engine? The key concept separating Friston's idea from traditional stochastic reinforcement learning methods and even Bayesian reinforcement learning is moving away from goal-directed optimisation. Remember to subscribe! Enjoy the show! 00:00:00 Show teaser intro  00:16:24 Main formalism for FEP  00:28:29 Path Integral  00:30:52 How did we feel talking to friston?  00:34:06 Skit - on cultures (checked, but maybe make shorter)  00:36:02 Friston joins  00:36:33 Main show introduction  00:40:51 Is prediction all it takes for intelligence?  00:48:21 balancing accuracy with flexibility  00:57:36 belief-free vs belief-based; beliefs are crucial   01:04:53 Fuzzy Markov Blankets and Wandering Sets   01:12:37 The Free Energy Principle conforms to itself   01:14:50 useful false beliefs  01:19:14 complexity minimization is the heart of free energy [01:19:14 ]Keith:   01:23:25 An Alpha to tip the scales? Absoute not! Absolutely yes!   01:28:47 FEP applied to brain anatomy   01:36:28 Are there multiple non-FEP forms in the brain?  01:43:11 a positive conneciton to backpropagation   01:47:12 The FEP does not explain the origin of FEP systems   01:49:32 Post-show banter #machinelearning
December 13, 2020
#032- Simon Kornblith / GoogleAI - SimCLR and Paper Haul!
This week Dr. Tim Scarfe, Sayak Paul and Yannic Kilcher speak with Dr. Simon Kornblith from Google Brain (Ph.D from MIT). Simon is trying to understand how neural nets do what they do. Simon was the second author on the seminal Google AI SimCLR paper. We also cover "Do Wide and Deep Networks learn the same things?", "Whats in a Loss function for Image Classification?",  and "Big Self-supervised models are strong semi-supervised learners". Simon used to be a neuroscientist and also gives us the story of his unique journey into ML. 00:00:00 Show Teaser / or "short version" 00:18:34 Show intro 00:22:11 Relationship between neuroscience and machine learning 00:29:28 Similarity analysis and evolution of representations in Neural Networks 00:39:55 Expressability of NNs 00:42:33 Whats in a loss function for image classification 00:46:52 Loss function implications for transfer learning 00:50:44 SimCLR paper  01:00:19 Contrast SimCLR to BYOL 01:01:43 Data augmentation 01:06:35 Universality of image representations 01:09:25 Universality of augmentations 01:23:04 GPT-3 01:25:09 GANs for data augmentation?? 01:26:50 Julia language @skornblith Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth What's in a Loss Function for Image Classification? A Simple Framework for Contrastive Learning of Visual Representations Big Self-Supervised Models are Strong Semi-Supervised Learners
December 06, 2020
#031 WE GOT ACCESS TO GPT-3! (With Gary Marcus, Walid Saba and Connor Leahy)
In this special edition, Dr. Tim Scarfe, Yannic Kilcher and Keith Duggar speak with Gary Marcus and Connor Leahy about GPT-3. We have all had a significant amount of time to experiment with GPT-3 and show you demos of it in use and the considerations. Note that this podcast version is significantly truncated, watch the youtube version for the TOC and experiments with GPT-3
November 28, 2020
#030 Multi-Armed Bandits and Pure-Exploration (Wouter M. Koolen)
This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic Kilcher discuss multi-arm bandits and pure exploration with Dr. Wouter M. Koolen, Senior Researcher, Machine Learning group, Centrum Wiskunde & Informatica. Wouter specialises in machine learning theory, game theory, information theory, statistics and optimisation. Wouter is currently interested in pure exploration in multi-armed bandit models, game tree search, and accelerated learning in sequential decision problems. His research has been cited 1000 times, and he has been published in NeurIPS, the number 1 ML conference 14 times as well as lots of other exciting publications. Today we are going to talk about two of the most studied settings in control, decision theory, and learning in unknown environment which are the multi-armed bandit (MAB) and reinforcement learning (RL) approaches - when can an agent stop learning and start exploiting using the knowledge it obtained - which strategy leads to minimal learning time 00:00:00 What are multi-arm bandits/show trailer 00:12:55 Show introduction 00:15:50 Bandits  00:18:58 Taxonomy of decision framework approaches  00:25:46 Exploration vs Exploitation  00:31:43 the sharp divide between modes  00:34:12 bandit measures of success  00:36:44 connections to reinforcement learning  00:44:00 when to apply pure exploration in games  00:45:54 bandit lower bounds, a pure exploration renaissance  00:50:21 pure exploration compiler dreams  00:51:56 what would the PX-compiler DSL look like  00:57:13 the long arms of the bandit  01:00:21 causal models behind the curtain of arms  01:02:43 adversarial bandits, arms trying to beat you  01:05:12 bandits as an optimization problem  01:11:39 asymptotic optimality vs practical performance  01:15:38 pitfalls hiding under asymptotic cover  01:18:50 adding features to bandits  01:27:24 moderate confidence regimes   01:30:33 algorithms choice is highly sensitive to bounds  01:46:09 Post script: Keith interesting piece on n quantum #machinelearning
November 20, 2020
#029 GPT-3, Prompt Engineering, Trading, AI Alignment, Intelligence
This week Dr. Tim Scarfe, Dr. Keith Duggar, Yannic Kilcher and Connor Leahy cover a broad range of topics, ranging from academia, GPT-3 and whether prompt engineering could be the next in-demand skill, markets and economics including trading and whether you can predict the stock market, AI alignment, utilitarian philosophy, randomness and intelligence and even whether the universe is infinite!  00:00:00 Show Introduction  00:12:49 Academia and doing a Ph.D  00:15:49 From academia to wall street  00:17:08 Quants -- smoke and mirrors? Tail Risk  00:19:46 Previous results dont indicate future success in markets  00:23:23 Making money from social media signals?  00:24:41 Predicting the stock market  00:27:20 Things which are and are not predictable  00:31:40 Tim postscript comment on predicting markets  00:32:37 Connor take on markets  00:35:16 As market become more efficient..  00:36:38 Snake oil in ML  00:39:20 GPT-3, we have changed our minds  00:52:34 Prompt engineering a new form of software development?  01:06:07 GPT-3 and prompt engineering  01:12:33 Emergent intelligence with increasingly weird abstractions  01:27:29 Wireheading and the economy  01:28:54 Free markets, dragon story and price vs value  01:33:59 Utilitarian philosophy and what does good look like?  01:41:39 Randomness and intelligence  01:44:55 Different schools of thought in ML  01:46:09 Is the universe infinite?  Thanks a lot for Connor Leahy for being a guest on today's show. -- you can join his EleutherAI community discord here:
November 08, 2020
NLP is not NLU and GPT-3 - Walid Saba
#machinelearning This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic Kilcher speak with veteran NLU expert Dr. Walid Saba.  Walid is an old-school AI expert. He is a polymath, a neuroscientist, psychologist, linguist,  philosopher, statistician, and logician. He thinks the missing information problem and lack of a typed ontology is the key issue with NLU, not sample efficiency or generalisation. He is a big critic of the deep learning movement and BERTology. We also cover GPT-3 in some detail in today's session, covering Luciano Floridi's recent article "GPT‑3: Its Nature, Scope, Limits, and Consequences" and a commentary on the incredible power of GPT-3 to perform tasks with just a few examples including the Yann LeCun commentary on Facebook and Hackernews.  Time stamps on the YouTube version 0:00:00 Walid intro  00:05:03 Knowledge acquisition bottleneck  00:06:11 Language is ambiguous  00:07:41 Language is not learned  00:08:32 Language is a formal language  00:08:55 Learning from data doesn’t work   00:14:01 Intelligence  00:15:07 Lack of domain knowledge these days  00:16:37 Yannic Kilcher thuglife comment  00:17:57 Deep learning assault  00:20:07 The way we evaluate language models is flawed  00:20:47 Humans do type checking  00:23:02 Ontologic  00:25:48 Comments On GPT3  00:30:54 Yann lecun and reddit  00:33:57 Minds and machines - Luciano  00:35:55 Main show introduction  00:39:02 Walid introduces himself  00:40:20 science advances one funeral at a time  00:44:58 Deep learning obsession syndrome and inception  00:46:14 BERTology / empirical methods are not NLU  00:49:55 Pattern recognition vs domain reasoning, is the knowledge in the data  00:56:04 Natural language understanding is about decoding and not compression, it's not learnable.  01:01:46 Intelligence is about not needing infinite amounts of time  01:04:23 We need an explicit ontological structure to understand anything  01:06:40 Ontological concepts  01:09:38 Word embeddings  01:12:20 There is power in structure  01:15:16 Language models are not trained on pronoun disambiguation and resolving scopes  01:17:33 The information is not in the data  01:19:03 Can we generate these rules on the fly? Rules or data?  01:20:39 The missing data problem is key  01:21:19 Problem with empirical methods and lecunn reference  01:22:45 Comparison with meatspace (brains)  01:28:16 The knowledge graph game, is knowledge constructed or discovered  01:29:41 How small can this ontology of the world be?  01:33:08 Walids taxonomy of understanding  01:38:49 The trend seems to be, less rules is better not the othe way around?  01:40:30 Testing the latest NLP models with entailment  01:42:25 Problems with the way we evaluate NLP  01:44:10 Winograd Schema challenge  01:45:56 All you need to know now is how to build neural networks, lack of rigour in ML research  01:50:47 Is everything learnable  01:53:02  How should we elevate language systems?  01:54:04 10 big problems in language (missing information)  01:55:59 Multiple inheritance is wrong  01:58:19 Language is ambiguous  02:01:14 How big would our world ontology need to be?  02:05:49 How to learn more about NLU  02:09:10 AlphaGo  Walid's blog: LinkedIn:
November 04, 2020
AI Alignment & AGI Fire Alarm - Connor Leahy
This week Dr. Tim Scarfe, Alex Stenlake and Yannic Kilcher speak with AGI and AI alignment specialist Connor Leahy a machine learning engineer from Aleph Alpha and founder of EleutherAI. Connor believes that AI alignment is philosophy with a deadline and that we are on the precipice, the stakes are astronomical. AI is important, and it will go wrong by default. Connor thinks that the singularity or intelligence explosion is near. Connor says that AGI is like climate change but worse, even harder problems, even shorter deadline and even worse consequences for the future. These problems are hard, and nobody knows what to do about them. 00:00:00 Introduction to AI alignment and AGI fire alarm  00:15:16 Main Show Intro  00:18:38 Different schools of thought on AI safety  00:24:03 What is intelligence?  00:25:48 AI Alignment  00:27:39 Humans dont have a coherent utility function  00:28:13 Newcomb's paradox and advanced decision problems  00:34:01 Incentives and behavioural economics  00:37:19 Prisoner's dilemma  00:40:24 Ayn Rand and game theory in politics and business  00:44:04 Instrumental convergence and orthogonality thesis  00:46:14 Utility functions and the Stop button problem  00:55:24 AI corrigibality - self alignment  00:56:16 Decision theory and stability / wireheading / robust delegation  00:59:30 Stop button problem  01:00:40 Making the world a better place  01:03:43 Is intelligence a search problem?  01:04:39 Mesa optimisation / humans are misaligned AI  01:06:04 Inner vs outer alignment / faulty reward functions  01:07:31 Large corporations are intelligent and have no stop function  01:10:21 Dutch booking / what is rationality / decision theory  01:16:32 Understanding very powerful AIs  01:18:03 Kolmogorov complexity  01:19:52 GPT-3 - is it intelligent, are humans even intelligent?  01:28:40 Scaling hypothesis  01:29:30 Connor thought DL was dead in 2017  01:37:54 Why is GPT-3 as intelligent as a human  01:44:43 Jeff Hawkins on intelligence as compression and the great lookup table  01:50:28 AI ethics related to AI alignment?  01:53:26 Interpretability  01:56:27 Regulation  01:57:54 Intelligence explosion  Discord: EleutherAI: Twitter: LinkedIn:
November 01, 2020
Kaggle, ML Community / Engineering (Sanyam Bhutani)
Join Dr Tim Scarfe, Sayak Paul, Yannic Kilcher, and Alex Stenlake have a conversation with Mr. Chai Time Data Science; Sanyam Bhutani! 00:00:00 Introduction  00:03:42 Show kick off  00:06:34 How did Sanyam get started into ML  00:07:46 Being a content creator  00:09:01 Can you be self taught without a formal education in ML?  00:22:54 Kaggle  00:33:41 H20 product / job  00:40:58 Intepretability / bias / engineering skills  00:43:22 Get that first job in DS  00:46:29 AWS ML Ops architecture / ml engineering  01:14:19 Patterns  01:18:09 Testability  01:20:54 Adversarial examples  Sanyam's blog -- Chai Time Data Science --
October 28, 2020
Sara Hooker - The Hardware Lottery, Sparsity and Fairness
Dr. Tim Scarfe, Yannic Kilcher and Sayak Paul chat with Sara Hooker from the Google Brain team! We discuss her recent hardware lottery paper, pruning / sparsity, bias mitigation and intepretability.  The hardware lottery -- what causes inertia or friction in the marketplace of ideas? Is there a meritocracy of ideas or do the previous decisions we have made enslave us? Sara Hooker calls this a lottery because she feels that machine learning progress is entirely beholdant to the hardware and software landscape. Ideas succeed if they are compatible with the hardware and software at the time and also the existing inventions. The machine learning community is exceptional because the pace of innovation is fast and we operate largely in the open, this is largely because we don't build anything physical which is expensive, slow and the cost of being scooped is high. We get stuck in basins of attraction based on our technology decisions and it's expensive to jump outside of these basins. So is this story unique to hardware and AI algorithms or is it really just the story of all innovation? Every great innovation must wait for the right stepping stone to be in place before it can really happen. We are excited to bring you Sara Hooker to give her take.  YouTube version (including TOC): Show notes; Sara Hooker page;
October 20, 2020
The Social Dilemma Part 3 - Dr. Rebecca Roache
This week join Dr. Tim Scarfe, Yannic Kilcher, and Keith Duggar have a conversation with Dr. Rebecca Roache in the last of our 3-part series on the social dilemma Netflix film. Rebecca is a senior lecturer in philosophy at Royal Holloway, university of London and has written extensively about the future of friendship.  People claim that friendships are not what they used to be. People are always staring at their phones, even when in public  Social media has turned us into narcissists who are always managing our own PR rather than being present with each other. Anxiety about the negative effects of technology are as old as the written word. Is technology bad for friendships? Can you have friends through screens? Does social media cause polarization? And is that a bad thing? Does it promote quantity over quality? Rebecca thinks that social media and echo chambers are less ominous to friendship on closer inspection.  00:00:32 Teaser clip from Rebecca and her new manuscript on friendship 00:02:52 Introduction  00:04:56 Memorisation vs reasoning / is technology enhancing friendships  00:09:29 Word of warcraft / gaming communities / echo chambers / polarisation  00:12:34 Horizontal vs Vertical social attributes  00:17:18 Exclusion of others opinions  00:20:36 The power to silence others / truth verification  00:23:58 Misinformation  00:27:28 Norms / memes / political terms and co-opting / bullying  00:31:57 Redefinition of political terms i.e. racism  00:36:13 Virtue signalling  00:38:57 How many friends can you have / spread thin / Dunbars 150  00:42:54 Is it morally objectionable to believe or contemplate objectionable ideas, punishment  00:50:52 Is speaking the same thing as acting   00:52:24 Punishment - deterrence vs retribution / historical  00:53:59 Yannic: contemplating is a form of speaking  00:57:32 silencing/blocking is intellectual laziness - what ideas are we allowed to talk about  01:04:53 Corporate AI ethics frameworks  01:09:14 Autonomous Vehicles  01:10:51 the eternal Facebook world / online vs offline friendships  01:14:05 How do we get the best out of our online friendships 
October 11, 2020
The Social Dilemma - Part 2
This week on Machine Learning Street Talk, Dr. Tim Scarfe, Dr. Keith Duggar, Alex Stenlake and Yannic Kilcher have a conversation with the founder and principal researcher at the Montreal AI Ethics institute -- Abhishek Gupta. We cover several topics from the Social Dilemma film and AI Ethics in general.  00:00:00 Introduction 00:03:57 Overcome our weaknesses 00:14:30 threat landscape blind spots   00:18:35 differential reality vs universal shaping   00:24:21 shared reality incentives and tools   00:32:01 transparency and knowledge to avoid pathology   00:40:09 federated informed autonomy     00:49:48 diversity is a metric, inclusion is a strategy   00:59:58 locally aligned pockets can stabilize global diversity  01:10:58 making inclusion easier with tools  01:23:35 enabling community feedback   01:26:16 open source the algorithms   01:33:02 the N+1 cost of inclusion   01:38:08 broader impact statement
October 06, 2020
The Social Dilemma - Part 1
In this first part of our three part series on the Social Dilemma Netflix film, Dr. Tim Scarfe, Yannic "Lightspeed" Kilcher and Zak Jost gang up with Cybersecurity expert Andy Smith. We give you our take on the film. We are super excited to get your feedback on this one! Hope you enjoy.    00:00:00 Introduction 00:06:11 Moral hypocrisy   00:12:38 Road to hell is paved with good intentions, attention economy 00:15:04 They know everything about you 00:18:02 Addiction 00:21:22 Differential realities 00:26:12 Self determination and Monetisation 00:29:08 AI: Overwhelm human strengths undermine human vulnerabilities 00:31:51 Conspiracy theory / fake news 00:34:23 Overton window / polarisation 00:39:12 Short attention span / convergent behaviour 00:41:26 Is social media good for you 00:45:17 Your attention time is linear, the things you can pay attention to are a volume, anonymity  00:51:32 Andy question on security: social engineering 00:56:32 Is it a security risk having your information in social media 00:58:02 Retrospective judgement 01:03:06 Free speech and censorship  01:06:06 Technology accelerator
October 03, 2020
Capsule Networks and Education Targets
In today's episode, Dr. Keith Duggar, Alex Stenlake and Dr. Tim Scarfe chat about the education chapter in Kenneth Stanley's "Greatness cannot be planned" book, and we relate it to our Algoshambes conversation a few weeks ago. We debate whether objectives in education are a good thing and whether they cause perverse incentives and stifle creativity and innovation. Next up we dissect capsule networks from the top down! We finish off talking about fast algorithms and quantum computing. 00:00:00 Introduction 00:01:13 Greatness cannot be planned / education  00:12:03 Perverse incentives 00:19:25 Treasure hunting  00:30:28 Capsule Networks 00:46:08 Capsules As Compositional Networks 00:52:45 Capsule Routing 00:57:10 Loss and Warps 01:09:55 Fast Algorithms and Quantum Computing
September 29, 2020
Programming Languages, Software Engineering and Machine Learning
This week Dr. Tim Scarfe, Dr. Keith Duggar, Yannic "Lightspeed" Kilcher have a conversation with Microsoft Senior Software Engineer Sachin Kundu. We speak about programming languages including which our favourites are and functional programming vs OOP. Next we speak about software engineering and the intersection of software engineering and machine learning. We also talk about applications of ML and finally what makes an exceptional software engineer and tech lead. Sachin is an expert in this field so we hope you enjoy the conversation! Spoiler alert, how many of you have read the Mythical Man-Month by Frederick P. Brooks?!   00:00:00 Introduction 00:06:37 Programming Languages 00:53:41 Applications of ML 01:55:59 What makes an exceptional SE and tech lead 01:22:08 Outro 
September 25, 2020
Computation, Bayesian Model Selection, Interactive Articles
This week Dr. Keith Duggar, Alex Stenlake and Dr. Tim Scarfe discuss the theory of computation, intelligence, Bayesian model selection, the intelligence explosion and the the phenomenon of "interactive articles".  00:00:00 Intro 00:01:27 Kernels and context-free grammars 00:06:04 Theory of computation 00:18:41 Intelligence 00:22:03 Bayesian model selection 00:44:05 AI-IQ Measure / Intelligence explosion 00:52:09 Interactive articles 01:12:32 Outro
September 22, 2020
Today Yannic Lightspeed Kilcher and I spoke with Alex Stenlake about Kernel Methods. What is a kernel? Do you remember those weird kernel things which everyone obsessed about before deep learning? What about Representer theorem and reproducible kernel hilbert spaces? SVMs and kernel ridge regression? Remember them?! Hope you enjoy the conversation! 00:00:00 Tim Intro 00:01:35 Yannic clever insight from this discussion  00:03:25 Street talk and Alex intro  00:05:06 How kernels are taught 00:09:20 Computational tractability 00:10:32 Maths  00:11:50 What is a kernel?  00:19:39 Kernel latent expansion  00:23:57 Overfitting  00:24:50 Hilbert spaces  00:30:20 Compare to DL 00:31:18 Back to hilbert spaces 00:45:19 Computational tractability 2 00:52:23 Curse of dimensionality 00:55:01 RBF: infinite taylor series 00:57:20 Margin/SVM  01:00:07 KRR/dual 01:03:26 Complexity compute kernels vs deep learning 01:05:03 Good for small problems? vs deep learning) 01:07:50 Whats special about the RBF kernel 01:11:06 Another DL comparison 01:14:01 Representer theorem 01:20:05 Relation to back prop 01:25:10 Connection with NLP/transformers 01:27:31 Where else kernels good 01:34:34 Deep learning vs dual kernel methods 01:33:29 Thoughts on AI 01:34:35 Outro
September 18, 2020
Explainability, Reasoning, Priors and GPT-3
This week Dr. Tim Scarfe and Dr. Keith Duggar discuss Explainability, Reasoning, Priors and GPT-3. We check out Christoph Molnar's book on intepretability, talk about priors vs experience in NNs, whether NNs are reasoning and also cover articles by Gary Marcus and Walid Saba critiquing deep learning. We finish with a brief discussion of Chollet's ARC challenge and intelligence paper.  00:00:00 Intro 00:01:17 Explainability and Christoph Molnars book on Intepretability 00:26:45 Explainability - Feature visualisation 00:33:28 Architecture / CPPNs 00:36:10 Invariance and data parsimony, priors and experience, manifolds 00:42:04 What NNs learn / logical view of modern AI (Walid Saba article) 00:47:10 Core knowledge 00:55:33 Priors vs experience  00:59:44 Mathematical reasoning  01:01:56 Gary Marcus on GPT-3  01:09:14 Can NNs reason at all?  01:18:05 Chollet intelligence paper/ARC challenge
September 16, 2020
SWaV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (Mathilde Caron)
This week Dr. Tim Scarfe, Yannic Lightspeed Kicher, Sayak Paul and Ayush Takur interview Mathilde Caron from Facebook Research (FAIR). We discuss Mathilde's paper which she wrote with her collaborators "SWaV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments" @  This paper is the latest unsupervised contrastive visual representations algorithm and has a new data augmentation strategy and also a new online clustering strategy.  Note; Other authors; Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin Sayak Paul -  @RisingSayak / Ayush Thakur - @ayushthakur0  / The article they wrote; 00:00:00 Yannic probability challenge (CAN YOU SOLVE IT?) 00:01:29 Intro topic (Tim) 00:08:18 Yannic take 00:09:33 Intro show and guests 00:11:29 SWaV elevator pitch  00:17:31 Clustering approach in general 00:21:17 Sayak and Ayush's article on SWaV  00:23:49 Optional transport problem / Sinkhorn-Knopp algorithm 00:31:43 Is clustering a natural approach for this? 00:44:19 Image augmentations  00:46:20 Priors vs experience (data) 00:48:32 Life at FAIR  00:52:33 Progress of image augmentation  00:56:10 When things do not go to plan with research 01:01:04 Question on architecture 01:01:43 SWaV Results 01:06:26 Reproducing Matilde's code 01:14:51 Do we need the whole dataset to set clustering loss 01:16:40 Self-supervised learning and transfer learning 01:23:25 Link to attention mechanism) 01:24:41 Sayak final thought why unsupervised better 01:25:56 Outro Abstract;  "Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a “swapped” prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks."
September 14, 2020
UK Algoshambles, Neuralink, GPT-3 and Intelligence
This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic "Lightspeed" Kilcher respond to the "Algoshambles" exam fiasco in the UK where the government were forced to step in to standardise the grades which were grossly inflated by the schools.  The schools and teachers are all paid on metrics related to the grades received by students, what could possibly go wrong?! The result is that we end up with grades which have lost all their value and students are coached for the exams and don't actually learn the subject.   We also cover the second Francois Chollet interview on the Lex Fridman podcast. We cover GPT-3, Neuralink, and discussion of intelligence. 00:00:00 Algoshambles  00:45:40 Lex Fridman/Chollet: Intro  00:55:21 Lex Fridman/Chollet: Neuralink  01:06:28 Lex Fridman/Chollet: GPT-3  01:23:43 Lex Fridman/Chollet: Intelligence discussion
September 07, 2020
Sayak Paul
This week we spoke with Sayak Paul, who is extremely active in the machine learning community. We discussed the AI landscape in India, unsupervised representation learning, data augmentation and contrastive learning, explainability, abstract scene representations and finally pruning and the recent super positions paper. I really enjoyed this conversation and I hope you folks do too! 00:00:00 Intro to Sayak 00:17:50 AI landscape in India 00:24:20 Unsupervised representation learning 00:26:11 DATA AUGMENTATION/Contrastive learning 00:59:20 EXPLAINABILITY 01:12:10 ABSTRACT SCENE REPRESENTATIONS 01:14:50 PRUNING and super position paper
July 17, 2020
Robert Lange on NN Pruning and Collective Intelligence
We speak with Robert Lange! Robert is a PhD student at the Technical University Berlin. His research combines Deep Multi-Agent Reinforcement Learning and Cognitive Science to study the learning dynamics of large collectives. He has a brilliant blog where he distils and explains cutting edge ML research. We spoke about his story, economics, multi-agent RL, intelligence and AGI, and his recent article summarising the state of the art in neural network pruning.  Robert's article on pruning in NNs 00:00:00 Intro 00:04:17 Show start and intro to Robert 00:11:39 Economics background  00:27:20 Intrinsic motivation  00:33:22 Intelligence/consciousness 00:48:16 Lottery ticket/pruning article discussion 01:43:21 Robert's advice for younger self and state of deep learning Robert's LinkedIn: @RobertTLange #machinelearning #deeplearning
July 08, 2020
WelcomeAIOverlords (Zak Jost)
We welcome Zak Jost from the WelcomeAIOverlords channel. Zak is an ML research scientist at Amazon. He has a great blog at and also a Discord channel at WelcomeAIOverlords:  00:00:00 INTRO START 00:01:07 MAIN SHOW START 00:01:59 ZAK'S STORY 00:05:06 YOUTUBE DISCUSSION 00:24:12 UNDERSTANDING PAPERS 00:29:53 CONTRASTIVE LEARNING INTRO 00:33:00 BRING YOUR OWN LATENT PAPER 01:03:13 GRAPHS IN ML AND KNOWLEDGE GRAPHS  01:21:36 GRAPH USE CASES - FRAUD 01:30:15 KNOWLEDGE GRAPHS 01:34:22 GRAPHS IN ML 01:38:53 AUTOMATED ML 01:57:32 OUTRO
June 30, 2020
Facebook Research - Unsupervised Translation of Programming Languages
In this episode of Machine Learning Street Talk Dr. Tim Scarfe, Yannic Kilcher and Connor Shorten spoke with Marie-Anne Lachaux, Baptiste Roziere and Dr. Guillaume Lample from Facebook Research (FAIR) in Paris. They recently released the paper "Unsupervised Translation of Programming Languages" which was an exciting new approach to learned translation of programming languages (learned transcoder) using an unsupervised encoder trained on individual monolingual corpora i.e. no parallel language data needed. The trick they used what that there is significant token overlap when using word-piece embeddings. It was incredible to talk with this talented group of researchers and I hope you enjoy the conversation too.  Yannic's video on this got watched over 120K times! Check it out too Paper;  Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample Abstract; "A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin."
June 24, 2020
Francois Chollet - On the Measure of Intelligence
We cover Francois Chollet's recent paper. Abstract; To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
June 19, 2020
OpenAI GPT-3: Language Models are Few-Shot Learners
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model. With the help of Microsoft’s ZeRO-2 / DeepSpeed optimiser, OpenAI trained an 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning. 00:00:00 Intro 00:00:54 ZeRO1+2 (model + Data parallelism) (Connor) 00:03:17 Recent history of NLP (Tim) 00:06:04 Yannic "Light-speed" Kilcher's brief overview of GPT-3 00:14:25 Reviewing Yannic's YT comments on his GPT-3 video (Tim) 00:20:26 Main show intro 00:23:03 Is GPT-3 reasoning?  00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT) 00:36:18 Utility of GPT-3 in industry 00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2) 00:51:03 Generalisation 00:56:48 Esoterics of language models 00:58:46 Architectural trade-offs 01:07:37 Memorization machines and intepretability 01:17:16 Nearest neighbour probes / watermarks 01:20:03 YouTube comments on GPT-3 video  01:21:50 GPT-3 news article generation issue 01:27:36 Sampling data for language models / bias / fairness / politics 01:51:12 Outro These paradigms of task adaptation are divided into zero, one, and few shot learning. Zero-shot learning is a very extreme case where we expect a language model to perform a task such as sentiment classification or extractive question answering, without any additional supervision. One and Few-shot learning provide some examples to the model. However, GPT-3s definition of this diverges a bit from the conventional literature. GPT-3 provides one and few-shot examples in the form of “In-Context Learning”. Instead of fine-tuning the model on a few examples, the model has to use the input to infer the downstream task. For example, the GPT-3 transformer has an input sequence of 2048 tokens, so demonstrations of a task such as yelp sentiment reviews, would have to fit in this input sequence as well as the new review. Thanks for watching! Please Subscribe! Paper Links: GPT-3: ZeRO: ZeRO (Blog Post): ZeRO-2 (Blog Post): #machinelearning #naturallanguageprocessing #deeplearning #gpt3
June 06, 2020
Jordan Edwards: ML Engineering and DevOps on AzureML
This week we had a super insightful conversation with  Jordan Edwards, Principal Program Manager for the AzureML team!  Jordan is on the coalface of turning machine learning software engineering into a reality for some of Microsoft's largest customers.  ML DevOps is all about increasing the velocity of- and orchastrating the non-interactive phase of- software deployments for ML. We cover ML DevOps and Microsoft Azure ML. We discuss model governance, testing, intepretability, tooling. We cover the age-old discussion of the dichotomy between science and engineering and how you can bridge the gap with ML DevOps. We cover Jordan's maturity model for ML DevOps.  We also cover off some of the exciting ML announcments from the recent Microsoft Build conference i.e. FairLearn, IntepretML, SEAL, WhiteNoise, OpenAI code generation, OpenAI GPT-3.  00:00:04 Introduction to ML DevOps and Microsoft Build ML Announcements 00:10:29 Main show kick-off 00:11:06 Jordan's story 00:14:36 Typical ML DevOps workflow 00:17:38 Tim's articulation of ML DevOps 00:19:31 Intepretability / Fairness 00:24:31 Testing / Robustness 00:28:10 Using GANs to generate testing data 00:30:26 Gratuitous DL? 00:33:46 Challenges of making an ML DevOps framework / IaaS 00:38:48 Cultural battles in ML DevOps 00:43:04 Maturity Model for Ml DevOps 00:49:19 ML: High interest credit card of technical debt paper 00:50:19 ML Engineering at Microsoft 01:01:20 ML Flow 01:03:05 Company-wide governance  01:08:15 What's coming next 01:12:10 Jordan's hillarious piece of advice for his younger self Super happy with how this turned out, this is not one to miss folks!  #deeplearning #machinelearning #devops #mldevops
June 03, 2020
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
*Note this is an episode from Tim's Machine Learning Dojo YouTube channel.  Join Eric Craeymeersch on a wonderful discussion all about ML engineering, computer vision, siamese networks, contrastive loss, one shot learning and metric learning.  00:00:00 Introduction  00:11:47 ML Engineering Discussion 00:35:59 Intro to the main topic 00:42:13 Siamese Networks 00:48:36 Mining strategies 00:51:15 Contrastive Loss 00:57:44 Trip loss paper 01:09:35 Quad loss paper 01:25:49 Eric's Quadloss Medium Article  02:17:32 Metric learning reality check 02:21:06 Engineering discussion II 02:26:22 Outro In our second paper review call, Tess Ferrandez covered off the FaceNet paper from Google which was a one-shot siamese network with the so called triplet loss. It was an interesting change of direction for NN architecture i.e. using a contrastive loss instead of having a fixed number of output classes. Contrastive architectures have been taking over the ML landscape recently i.e. SimCLR, MOCO, BERT.  Eric wrote an article about this at the time:  He then discovered there was a new approach to one shot learning in vision using a quadruplet loss and metric learning. Eric wrote a new article and several experiments on this @ Paper details:  Beyond triplet loss: a deep quadruplet network for person re-identification (Chen at al '17) "Person re-identification (ReID) is an important task in wide area video surveillance which focuses on identifying people across different cameras. Recently, deep learning networks with a triplet loss become a common framework for person ReID. However, the triplet loss pays main attentions on obtaining correct orders on the training set. It still suffers from a weaker generalization capability from the training set to the testing set, thus resulting in inferior performance. In this paper, we design a quadruplet loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalization ability and can achieve a higher performance on the testing set. In particular, a quadruplet deep network using a margin-based online hard negative mining is proposed based on the quadruplet loss for the person ReID. In extensive experiments, the proposed network outperforms most of the state-of-the-art algorithms on representative datasets which clearly demonstrates the effectiveness of our proposed method." Original facenet paper; #deeplearning #machinelearning
June 02, 2020
Harri Valpola: System 2 AI and Planning in Model-Based Reinforcement Learning
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten interviewed Harri Valpola, CEO and Founder of Curious AI. We continued our discussion of System 1 and System 2 thinking in Deep Learning, as well as miscellaneous topics around Model-based Reinforcement Learning. Dr. Valpola describes some of the challenges of modelling industrial control processes such as water sewage filters and paper mills with the use of model-based RL. Dr. Valpola and his collaborators recently published “Regularizing Trajectory Optimization with Denoising Autoencoders” that addresses some of the concerns of planning algorithms that exploit inaccuracies in their world models! 00:00:00 Intro to Harri and Curious AI System1/System 2 00:04:50 Background on model-based RL challenges from Tim 00:06:26 Other interesting research papers on model-based RL from Connor 00:08:36 Intro to Curious AI recent NeurIPS paper on model-based RL and denoising autoencoders from Yannic 00:21:00 Main show kick off, system 1/2 00:31:50 Where does the simulator come from? 00:33:59 Evolutionary priors 00:37:17 Consciousness 00:40:37 How does one build a company like Curious AI? 00:46:42 Deep Q Networks 00:49:04 Planning and Model based RL 00:53:04 Learning good representations 00:55:55 Typical problem Curious AI might solve in industry 01:00:56 Exploration 01:08:00 Their paper - regularizing trajectory optimization with denoising 01:13:47 What is Epistemic uncertainty 01:16:44 How would Curious develop these models 01:18:00 Explainability and simulations 01:22:33 How system 2 works in humans 01:26:11 Planning 01:27:04 Advice for starting an AI company 01:31:31 Real world implementation of planning models 01:33:49 Publishing research and openness We really hope you enjoy this episode, please subscribe! Regularizing Trajectory Optimization with Denoising Autoencoders: Pulp, Paper & Packaging: A Future Transformed through Deep Learning: Curious AI: Harri Valpola Publications: Some interesting papers around Model-Based RL: GameGAN: Plan2Explore: World Models: MuZero: PlaNet: A Deep Planning Network for RL: Dreamer: Scalable RL using World Models: Model Based RL for Atari:
May 25, 2020
ICLR 2020: Yoshua Bengio and the Nature of Consciousness
In this episode of Machine Learning Street Talk, Tim Scarfe, Connor Shorten and Yannic Kilcher react to Yoshua Bengio’s ICLR 2020 Keynote “Deep Learning Priors Associated with Conscious Processing”. Bengio takes on many future directions for research in Deep Learning such as the role of attention in consciousness, sparse factor graphs and causality, and the study of systematic generalization. Bengio also presents big ideas in Intelligence that border on the line of philosophy and practical machine learning. This includes ideas such as consciousness in machines and System 1 and System 2 thinking, as described in Daniel Kahneman’s book “Thinking Fast and Slow”. Similar to Yann LeCun’s half of the 2020 ICLR keynote, this talk takes on many challenging ideas and hopefully this video helps you get a better understanding of some of them! Thanks for watching!  Please Subscribe for more videos! Paper Links: Link to Talk: The Consciousness Prior: Thinking Fast and Slow: Systematic Generalization: CLOSURE: Assessing Systematic Generalization of CLEVR Models: Neural Module Networks: Experience Grounds Language: Benchmarking Graph Neural Networks: On the Measure of Intelligence: Please check out our individual channels as well! Machine Learning Dojo with Tim Scarfe: Yannic Kilcher: Henry AI Labs: 00:00:00 Tim and Yannics takes 00:01:37 Intro to Bengio 00:03:13 System 2, language and Chomsky 00:05:58 Cristof Koch on conciousness 00:07:25 Francois Chollet on intelligence and consciousness 00:09:29 Meditation and Sam Harris on consciousness 00:11:35 Connor Intro 00:13:20 Show Main Intro 00:17:55 Priors associated with Conscious Processing 00:26:25 System 1 / System 2 00:42:47 Implicit and Verbalized Knowledge [DONT MISS THIS!] 01:08:24 Inductive Priors for DL 2.0 01:27:20 Systematic Generalization 01:37:53 Contrast with the Symbolic AI Program 01:54:55 Attention 02:00:25 From Attention to Consciousness 02:05:31 Thoughts, Consciousness, Language 02:06:55 Sparse Factor graph 02:10:52 Sparse Change in Abstract Latent Space 02:15:10 Discovering Cause and Effect 02:20:00 Factorize the joint distribution 02:22:30 RIMS: Modular Computation 02:24:30 Conclusion #machinelearning #deeplearning
May 22, 2020
ICLR 2020: Yann LeCun and Energy-Based Models
This week Connor Shorten, Yannic Kilcher and Tim Scarfe reacted to Yann LeCun's keynote speech at this year's ICLR conference which just passed. ICLR is the number two ML conference and was completely open this year, with all the sessions publicly accessible via the internet. Yann spent most of his talk speaking about self-supervised learning, Energy-based models (EBMs) and manifold learning. Don't worry if you hadn't heard of EBMs before, neither had we! Thanks for watching! Please Subscribe! Paper Links: ICLR 2020 Keynote Talk: A Tutorial on Energy-Based Learning: Concept Learning with Energy-Based Models (Yannic's Explanation): Concept Learning with Energy-Based Models (Paper): Concept Learning with Energy-Based Models (OpenAI Blog Post): #deeplearning #machinelearning #iclr #iclr2020 #yannlecun
May 19, 2020
The Lottery Ticket Hypothesis with Jonathan Frankle
In this episode of Machine Learning Street Talk, we chat with Jonathan Frankle, author of The Lottery Ticket Hypothesis. Frankle has continued researching Sparse Neural Networks, Pruning, and Lottery Tickets leading to some really exciting follow-on papers! This chat discusses some of these papers such as Linear Mode Connectivity, Comparing and Rewinding and Fine-tuning in Neural Network Pruning, and more (full list of papers linked below). We also chat about how Jonathan got into Deep Learning research, his Information Diet, and work on developing Technology Policy for Artificial Intelligence!  This was a really fun chat, I hope you enjoy listening to it and learn something from it! Thanks for watching and please subscribe! Huge thanks to everyone on r/MachineLearning who asked questions! Paper Links discussed in the chat: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks: Linear Mode Connectivity and the Lottery Ticket Hypothesis: Dissecting Pruned Neural Networks: Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs: What is the State of Neural Network Pruning? The Early Phase of Neural Network Training: Comparing Rewinding and Fine-tuning in Neural Network Pruning: (Also Mentioned) Block-Sparse GPU Kernels: Balanced Sparsity for Efficient DNN Inference on GPU: Playing the Lottery with Rewards and Multiple Languages: Lottery Tickets in RL and NLP: r/MachineLearning question list: (edited)  #machinelearning #deeplearning
May 19, 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten chat about Large-scale Transfer Learning in Natural Language Processing. The Text-to-Text Transfer Transformer (T5) model from Google AI does an exhaustive survey of what’s important for Transfer Learning in NLP and what’s not. In this conversation, we go through the key takeaways of the paper, text-to-text input/output format, architecture choice, dataset size and composition, fine-tuning strategy, and how to best use more computation. Beginning with these topics, we diverge into exciting ideas such as embodied cognition, meta-learning, and the measure of intelligence. We are still beginning our podcast journey and really appreciate any feedback from our listeners. Is the chat too technical? Do you prefer group discussions, interviewing experts, or chats between the three of us? Thanks for watching and if you haven’t already, Please Subscribe! Paper Links discussed in the chat: Text-to-Text Transfer Transformer: Experience Grounds Language (relevant to divergent discussion about embodied cognition): On the Measure of Intelligence: Train Large, Then Compress: Scaling Laws for Neural Language Models: The Illustrated Transformer: ELECTRA: Transformer-XL: Reformer: The Efficient Transformer: The Evolved Transformer: DistilBERT: How to generate text (HIGHLY RECOMMEND): Tokenizers:
May 19, 2020
CURL: Contrastive Unsupervised Representations for Reinforcement Learning
According to Yann Le Cun, the next big thing in machine learning is unsupervised learning. Self-supervision has changed the entire game in the last few years in deep learning, first transforming the language world with word2vec and BERT -- but now it's turning computer vision upside down.  This week Yannic, Connor and I spoke with one of the authors, Aravind Srinivas who recently co-led the hot-off-the-press CURL: Contrastive Unsupervised Representations for Reinforcement Learning alongside Michael (Misha) Laskin. CURL has had an incredible reception in the ML community in the last month or so. Remember the Deep Mind paper which solved the Atari games using the raw pixels? Aravind's approach uses contrastive unsupervised learning to featurise the pixels before applying RL. CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features! This is a huge step forwards in being able to apply RL in the real world.  We explore RL and self-supervision for computer vision in detail and find out about how Aravind got into machine learning.  Original YouTube Video: Paper: CURL: Contrastive Unsupervised Representations for Reinforcement Learning Aravind Srinivas, Michael Laskin, Pieter Abbeel Yannic's analysis video:  #machinelearning #reinforcementlearning #curl #timscarfe #yannickilcher #connorshorten Music credit;
May 02, 2020
Exploring Open-Ended Algorithms: POET
Three YouTubers; Tim Scarfe - Machine Learning Dojo (, Connor Shorten - Henry AI Labs ( and Yannic Kilcher ( We made a new YouTube channel called Machine Learning Street Talk. Every week we will talk about the latest and greatest in AI. Subscribe now! Special guests this week; Dr. Mathew Salvaris (, Eric Craeymeersch (, Dr. Keith Duggar (,  Dmitri Soshnikov ( We discuss the new concept of an open-ended, or "AI-Generating" algorithm. Open-endedness is a class of algorithms which generate problems and solutions to increasingly complex and diverse tasks. These algorithms create their own curriculum of learning. Complex tasks become tractable because they are now the final stepping stone in a lineage of progressions. In many respects, it's better to trust the machine to develop the learning curriculum, because the best curriculum might be counter-intuitive. These algorithms can generate a radiating tree of evolving challenges and solutions just like natural evolution. Evolution has produced an eternity of diversity and complexity and even produced human intelligence as a side-effect! Could AI-generating algorithms be the next big thing in machine learning? Wang, Rui, et al. "Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions." arXiv preprint arXiv:2003.08536 (2020). Wang, Rui, et al. "Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions." arXiv preprint arXiv:1901.01753 (2019). Watch Yannic’s video on POET: and on the extended POET: Watch Connor’s video UberAI labs video:    #reinforcementlearning #machinelearning #uber #deeplearning #rl #timscarfe #connorshorten #yannickilcher
April 24, 2020