Deep Learning: Classics and Trends

“A super influential reading group that has achieved cult-like status.” —John Sears

Deep Learning: Classics and Trends (DLCT) is a reading group I have been running since 2018. It started within Uber AI Labs, with the support of Zoubin, Ken and Jason, and the help of many, when we felt the need of a space to sample the overwhelmingly large amount of papers, and to hold free-form, judgemental (JK), cozy discussions; or as Piero puts it, to “ask a million questions” without embarrassment.

Since then, it has grown much larger, first opened up to the broader machine learning community in Uber, then to the general public in 2019. Starting March 2020, in light of COVID-19, we hold all meetings virtually, making it radically accessible to anyone from anywhere. Starting June 2020, DLCT operates under ML Collective, with a mission of making research more approachable and researchers more connected.

August 2020: I moved the official page to be under MLC, and all future updates will be made there. This page here, is kept only for memory. 2+ years running this, to me it’s more than reading papers and listening to presentations. It has started to serve as an anchor for all of us to connect every once in a while amidst all the changes, shifts of emphasis, and chaos, in Bay Area, in AI research, and generally in this fast-paced world.

The best thing about it is the group of people that it enables to connect—seriously, the smartest and kindest researchers that I feel so lucky to have known and have worked with.

Hi there! Content from July 2020 onwward are migrated to mlcollective.org/dlct/
This page stores all talks before the migration and other info.

Past events

Date	Presenter	Topic or Paper
2020.08.21	Anna Goldie, Anna Goldie is a Senior Software Engineer at Google Brain and co-founder/tech-lead of the Machine Learning for Systems Team, which focuses on deep reinforcement learning approaches to problems in computer systems. She is also a PhD student in the Stanford NLP Group, where she is advised by Professor Chris Manning. At MIT, she earned a Masters of Computer Science, Bachelors of Computer Science, and Bachelors of Linguistics. She speaks fluent Mandarin, Japanese, and French, as well as conversational Spanish, Italian, German, and Korean. She has given high-profile keynotes in Mandarin Chinese, and her work has been covered in various media outlets, including MIT Technology Review and IEEE Spectrum. Azalia Mirhoseini Azalia Mirhoseini is a Senior Research Scientist at Google Brain. She is the co-founder/tech-lead of the Machine Learning for Systems Team at Brain where they focus on deep reinforcement learning based approaches to solve problems in computer systems and metalearning. She has a Ph.D. in Electrical and Computer Engineering from Rice University. She has received a number of awards, including the MIT Technology Review 35 under 35 award, the Best Ph.D. Thesis Award at Rice and a Gold Medal in the National Math Olympiad in Iran. Her work has been covered in various media outlets including MIT Technology Review and IEEE Spectrum.	Chip Placement with Deep Reinforcement Learning In the past decade, computer systems and chips have played a key role in the success of AI. Our vision in Google Brain’s ML for Systems team is to use AI to transform the way in which computer systems and chips are designed. Many core problems in systems and hardware design are combinatorial optimization or decision making tasks with state and action spaces that are orders of magnitude larger than that of standard AI benchmarks in robotics and games. In this talk, we will describe some of our latest learning based approaches to tackling such large-scale optimization problems. We will discuss our work on a new domain-transferable reinforcement learning method for optimizing chip placement, a long pole in hardware design. Our approach is capable of learning from past experience and improving over time, resulting in more optimized placements on unseen chip blocks as the RL agent is exposed to a larger volume of data. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator chips, whereas existing baselines require human experts in the loop and can take several weeks. [Paper]
2020.08.14	Zhongqi Miao, Zhongqi Miao is currently a Ph.D. candidate at University of California, Berkeley working with Prof. Stella Yu and Prof. Wayne Getz. His research is focused on computer vision and deep learning applications in realistic settings, such as long-tailed recognition and domain adaptation. Ziwei Liu Dr. Ziwei Liu is currently a senior research fellow at the Chinese University of Hong Kong. Before that, Ziwei was a postdoctoral researcher at University of California, Berkeley, working with Prof. Stella Yu. Ziwei received his PhD from the Chinese University of Hong Kong in 2017, under the supervision of Prof. Xiaoou Tang and Prof. Xiaogang Wang. During his PhD, Ziwei had the privilege of interning at Microsoft Research and Google Research, where he developed Microsoft Pix and Google Clips. His research revolves around computer vision/graphics, machine learning, and robotics. He has published over 40 papers (with more than 6,000 citations) on top-tier conferences and journals in relevant fields, including CVPR, ICCV, ECCV, AAAI, IROS, SIGGRAPH, T-PAMI, and TOG. He is the recipient of Microsoft Young Fellowship, Hong Kong PhD Fellowship, ICCV Young Researcher Award, and HKSTP best paper award. He has won the championship in major computer vision competitions, including DAVIS video segmentation challenge 2017, MSCOCO instance segmentation challenge 2018, and FAIR self-supervision challenge 2019. He is also the lead contributor of several renowned computer vision benchmarks and softwares, including CelebA, DeepFashion, mmdetection and mmfashion.	Deep Learning and Realistic Datasets The success of modern deep learning techniques is based on standardized and balanced datasets such as ImageNet. However, methods developed from these datasets often fail on realistic datasets under realistic scenarios. In this presentation, we summarize three characteristics of realistic datasets: 1) long-tailed; 2) open-ended; and 3) multi-domain. We also discuss two CVPR projects that are dealing with these features: 1) Open Long-Tailed Recognition (OLTR) and 2) Open Compound Domain Adaptation (OCDA). In OLTR: we address open long-tailed recognition with an integrated algorithm that handles imbalanced classification, few-shot learning, and open-set recognition at the same time, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. Our OLTR algorithm maps an image to a feature space such that visual concepts can relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. In OCDA: we consider the open compound domain adaptation problem, in which the compound target domain is a combination of multiple traditional target domains without domain labels, reflecting realistic data collection in various mixed as well as novel conditions. Contrary to existing single- or multi-target domain adaptation works, where known clear distinctions between domains are often assumed, OCDA does not rely on domain boundaries and continuously adapts the model within a learned domain space. Our model consists of two technical insights into OCDA: 1) a curriculum domain adaptation strategy to bootstrap generalization across domain distinction in a data-driven, self-organizing, and continuous fashion and 2) a memory module to increase the model’s agility towards novel domains. [Paper 1] [Paper 2] [Slides]
2020.07.31	Dan Hendrycks Dan Hendrycks is a second-year PhD student at UC Berkeley, advised by Jacob Steinhardt and Dawn Song. His research aims to disentangle and concretize the components necessary for safe AI. This leads him to work on quantifying and improving the performance of models in unforeseen out-of-distribution scenarios, and more recently he works on machine ethics. Dan received his BS from the University of Chicago. https://twitter.com/DanHendrycks	Out-of-distribution robustness in computer vision and NLP Although ResNets and BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? In this talk I survey benchmarks in vision and NLP that measure how well models hold up when there is a discrepancy between the train and test set. The talk will draw on results in NLP from http://arxiv.org/abs/2004.06100 and recent vision results from http://arxiv.org/abs/2006.16241 [Paper 1] [Paper 2] [Slides]
2020.07.24	Ben Mann Ben Mann is a Member of Technical Staff at OpenAI. He is the go-to person for data engineering, but dabbles in everything. Outside work, he blogs about a wide range of topics from ML to hiking to pooping better. One day he hopes to make superintelligent AI that is safe and beneficial for humanity.	Language Models are Few-Shot Learners I’ll describe our major contributions in this paper, as well as where we fell short. My work was mainly on training data, eval memorization, and the eval suite. I’ll offer deep dives on these sections. [Paper] [Slides]
2020.07.17	Hanie Sedghi Hanie Sedghi is a senior research scientist at Google Brain, where she leads the “Deep Phenomena” research group. Her approach is to bond theory and practice in large-scale machine learning by designing algorithms with theoretical guarantees that also work efficiently in practice. Over the recent years, she has been working on understanding deep learning phenomena and improving the training algorithms. Hanie has various publications in this area and has organized many workshops to expand the domain, such as Deep Phenomena workshop at ICML 2019 and Deep Learning Day at KDD 2020. She is an area chair at ICML, ICLR, ALT, a member of JMLR editorial board and has served as a reviewer for many prominent conferences. Hanie has mentored several junior researchers and students, and is passionate about helping people from marginalized groups. Prior to Google, she was a research scientist at Allen Institute for Artificial Intelligence and before that, a postdoctoral fellow under the supervision of professor Anima Anandkumar. Hanie got her PhD from University of Southern California with a minor in mathematics and her Masters and Bachelors at Sharif University of Technology, Iran.	The intriguing role of module criticality in the generalization of deep networks We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network’s performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called module criticality, based on the shape of the valleys that connect the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so. I will also cover our recent results on extension to transfer learning setting, and how module criticality predicts which layers of the network play an important role for successful transfer. [Paper] [Slides]
2020.07.10	Chiyuan Zhang Chiyuan Zhang is a research scientist at Google Research, Brain Team. He is interested in analyzing and understanding the foundations behind the effectiveness of deep learning, as well as its connection to the cognition and learning mechanisms of the human brain. He is also interested in future directions to break the data inefficiency bottleneck in most current deep learning algorithms. Chiyuan Zhang holds a Ph.D. from MIT (2017), and a Bachelor (2009) and a Master (2012) degrees in computer science from Zhejiang University, China. His work was recognized by INTERSPEECH best student paper award in 2014, and ICLR best paper award in 2017.	What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman [Fel19] proposes a theoretical explanation for this phenomenon based on a combination of two insights. First, natural image and data distributions are (informally) known to be long-tailed, that is have a significant fraction of rare and atypical examples. Second, in a simple theoretical model such memorization is necessary for achieving close-to-optimal generalization error when the data distribution is long-tailed. However, no direct empirical evidence for this explanation or even an approach for obtaining such evidence were given. In this work we design experiments to test the key ideas in this theory. The experiments require estimation of the influence of each training example on the accuracy at each test example as well as memorization values of training examples. Estimating these quantities directly is computationally prohibitive but we show that closely-related subsampled influence and memorization values can be estimated much more efficiently. Our experiments demonstrate the significant benefits of memorization for generalization on several standard benchmarks. They also provide quantitative and visually compelling evidence for the theory put forth in [Fel19]. [Paper] [Slides]
2020.07.03	Rosanne Liu, Piero Molino, Joel Lehman, Jonathan Frankle We are a team that has stories of hope and regret, things to gloat and to rant, and altogether lots of opinions about grad school experiences! And we are happy to answer your questions.	Grad School Retrospective Why did you go to grad school? Looking back, what would you have done differently? What are today’s grad school students/applicants facing, and how can they be better supported? We wish to touch upon all these during the panel, and address any other questions from the public via Slido 👇👇 [Slido]
2020.06.26	Aditya Kusupati Aditya Kusupati is a first year CS PhD student at University of Washington jointly advised by Ali Farhadi and Sham Kakade. My broad research interests at the moment lie in the intersection of Machine Learning, Computer Vision and Robotics (Multimodal Perception, shh! it is a secret). He is currently a Research Scientist Intern at NVIDIA Toronto Lab working with Sanja Fidler and Antonio Torralba for the summer. Before coming to UW, he spent two amazing years as a Research Fellow at Microsoft Research India with Manik Varma and Prateek Jain working on “The Extremes of Machine Learning”. In a past life, he earned a Bachelor’s in CS with Honours and a Minor in EE from IIT Bombay where he had the pleasure of working with Soumen Chakrabarti on geometric embeddings for Entity Typing. While not doing research, he makes enough mischief to increase the entropy of the lab and creates presentations like “How to become Batman?”. He never attended a formal ML course until last quarter.	Soft Threshold Weight Reparameterization for Learnable Sparsity Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). We propose Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR. [Paper] [Slides]
2020.06.19	Jianyu Wang Jianyu Wang is a third-year PhD student at Carnegie Mellon University, advised by professor Gauri Joshi. He has worked at Facebook AI Research and Google Research as a summer intern. Previously, Jianyu received his B.Eng in Electronic Engineering from Tsinghua University in 2017. His awards and honors include the best student paper award at NeurIPS Federated Learning Workshop (2019), and Qualcomm innovation fellowship (2018).	SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods (eg, using gossip algorithms) to decouple communications among workers. Although these methods run faster than AllReduce-based methods, which use blocking communication before every update, the resulting models may be less accurate after the same number of updates. Inspired by the BMUF method of Chen & Huo (2016), we propose a slow momentum (SlowMo) framework, where workers periodically synchronize and perform a momentum update, after multiple iterations of a base optimization algorithm. Experiments on image classification and machine translation tasks demonstrate that SlowMo consistently yields improvements in optimization and generalization performance relative to the base optimizer, even when the additional overhead is amortized over many updates so that the SlowMo runtime is on par with that of the base optimizer. We provide theoretical convergence guarantees showing that SlowMo converges to a stationary point of smooth non-convex losses. Since BMUF is a particular instance of the SlowMo framework, our results also correspond to the first theoretical convergence guarantees for BMUF. [Paper] [Slides]
2020.06.12	Grace Lindsay I’m currently a joint research fellow at the Gatsby Computational Neuroscience Unit and Sainsbury Wellcome Centre at UCL where I am building models of visual processing. I did my PhD at the Center for Theoretical Neuroscience at Columbia University in the lab of Ken Miller and before that I got my bachelor’s in neuroscience from the University of Pittsburgh. In addition to doing computational neuroscience research, I am also writing a popular science book on the history of the field for Bloomsbury Sigma!	Visual Attention in Artificial and Biological Neural Networks Attention has been studied in psychology for over a hundred years and studies that record from neurons have aimed to understand the physical underpinnings of attentional processes for several decades. More recently, attention mechanisms have been added to artificial neural networks to enhance their performance. In this talk, I will briefly overview the study of attention in these different domains, with a focus on visual attention. I will then describe my own work using findings from neurophysiology to add feature-based attention to convolutional neural networks (CNNs). CNNs are currently some of the best models available of the primate visual system and they allow neuroscientists to probe the relationship between neural activity and task performance “in silico”. I will share how studying attention in these models can lead to a rethinking of how biological attention works. [Paper] [Slides]
2020.06.08	Mark van der Wilk Mark van der Wilk is a lecturer (assistant professor) at Imperial College London. He is particularly interested in how to learn appropriate inductive biases from data, instead of hand-designing them, and thinks that results from Gaussian processes may contribute to a solution. The overall aim is to make models more adaptive and data-efficient, which can be used to improve decision making and reinforcement learning.	Learning Invariances using the Marginal Likelihood To improve generalisation in supervised learning, it is common to encourage invariance in the solution, i.e. keeping the output relatively constant to irrelevant transformations of the input. Many techniques can be seen as introducing invariance, such as data augmentation, convolutional structure, or more general group structure. But how do we learn what invariances should be used for a dataset? In this talk, we will discuss why the usual training loss is not the right objective function. We instead use the marginal likelihood as suggested by Bayesian inference, and develop a procedure which learns a useful invariance through gradient-based optimisation. Our model learns to be invariant to perturbations that are commonly hand-crafted in data augmentation, and learns very different perturbations depending on the dataset. We finish by speculating on how procedures like these can help automate the creation of network architectures. [Paper] [Slides] [Recording]
2020.05.22	Piero Molino Piero Molino is a Senior Research Scientist at Uber AI (for one more week, aka till his birthday) with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs. At Uber he works on research topics including Dialogue Systems, Language Generation, Graph Representation Learning, Computer Vision, Reinforcement Learning and Meta Learning. He also worked on several deployed systems like COTA, an ML and NLP model for Customer Support, Dialogue Systems for driver hands free dispatch, pickup and communications, and on the Uber Eats Recommender System with graph learning. He is the author of Ludwig, a code-free deep learning toolbox backed by the Linux Foundation.	Structuralism as the Origin of Self-Supervised Learning and Word Embeddings In this talk I’ll propose a historical perspective that traces the origin of current self-supervision and word embedding trends in machine learning to the structuralist ideas proposed by Ferdinand de Saussure and Ludwig Wittgenstein in the early 20th century. I will also showcase several distributional semantic models (pre deep learning approaches to learn word representations) and connect them with more modern approaches up to recent self-supervised models for language, vision and graph structured data. The intent is that by showing the origins of these ideas the audience would be better equipped to both put the current self-supervision research in perspective with respect to the broader cultural context, and learn from past research as it contained deep insights that can help inform future directions for the field. WARNING: there will not be a lot of deep learning in the talk, but potentially a lot of food for thought. [Slides & Recording on Piero’s website]
2020.05.08	Sebastian Risi Sebastian Risi is an Associate Professor at the IT University of Copenhagen where he co-directs the Robotics, Evolution and Art Lab (REAL). He is currently the principal investigator of a Sapere Aude: DFF Starting Grant (Innate: Adaptive Machines for Industrial Automation). He has won several international scientific awards, including multiple best paper awards, the Distinguished Young Investigator in Artificial Life 2018 award, a Google Faculty Research Award in 2019, and an Amazon Research Award in 2020. Recently he co-founded modl.ai, a company that develops AIs that can accelerate game development and enhance player engagement. More information: sebastianrisi.com	Data-Driven Encodings for Robust, Scalable, and Interpretable Evolutionary Computation In this talk, I review a new class of genotype-to-phenotype encodings, which are not manually defined but learned from the data itself. For example, we can train a GAN on Super Mario Bros levels, allowing levels to be evolved in the latent space of a GAN that maximize desired properties such as difficulty. When the GAN is trained on a specific target domain, it becomes a compact and robust genotype-to-phenotype mapping allowing for target-based evolution. This Latent Variable Evolution (LVE) approach can also be combined with interactive evolution, allowing users to breed their own video game levels and play those discovered levels. I’ll also present our latest results on CPPN2GAN, in which a Compositional Pattern Producing Network (CPPN) can define latent vector GAN inputs as a function of geometry, which provides a way to organize level segments output by a GAN into large-scale patterns. The benefit of these data-driven encodings is that they make it easy to explore the space of high-quality solutions, for both humans and optimization algorithms. [Paper 1] [Paper 2] [Paper 3] [Slides]
2020.05.01	Rowan Zellers Rowan Zellers is a 4th year PhD student at the University of Washington, working with Yejin Choi and Ali Farhadi, studying natural language processing and computer vision.	Evaluating Machines by their Real-World Language Use There is a fundamental gap between how humans understand and use language — in open-ended, real-world situations — and today’s NLP benchmarks for language understanding. To narrow this gap, we propose to evaluate machines by their success at real-world language use – which greatly expands the scope of language tasks that can be measured and studied. We introduce TuringAdvice, a new challenge for language understanding systems. Given a complex situation faced by a real person, a machine must generate helpful advice. We make our challenge concrete by introducing RedditAdvice, a dataset and leaderboard for measuring progress. Though we release a training set with 600k examples, our evaluation is dynamic, continually evolving with the language people use: models must generate helpful advice for recently-written situations. Empirical results show that today’s models struggle at our task, even those with billions of parameters. The best model, a finetuned T5, writes advice that is at least as helpful as human-written advice in only 9% of cases. This low performance reveals language understanding errors that are hard to spot outside of a generative setting, showing much room for progress. [Paper] [Slides]
2020.04.24	Hattie Zhou Hattie is a data scientist and research hobbyist at Uber.	Compositional generalization of seq2seq models Human intelligence exhibits systematic compositionality (Fodor & Pylyshyn, 1988), the capacity to understand and produce a potentially infinite number of novel combinations of known components, i.e., to make “infinite use of finite means” (Chomsky, 1965). In the context of learning from a set of training examples, we can observe compositionality as compositional generalization, which we take to mean the ability to generalize to composed test examples from one distribution after being exposed to the necessary components during training on a different distribution. In this talk, I will discuss several papers on the topic of compositional generalization. We will first look at the compositional generalization abilities of seq2seq models as measured on the SCAN tasks, and then look at some ideas that have been proposed to improve compositional generalization. [Paper 1] [Paper 2] [Paper 3] [Slides]
2020.04.17	Nikhil Dev Deshmudre	The Go Evolution, Part II [AlphaGo], [AlphaGo Zero], [Alpha Zero], [MuZero] [Slides]
2020.04.10	Nikhil Dev Deshmudre Nikhil is an engineer at Uber ATG (Uber’s self driving group). He works on increasing the realism of simulators for self driving cars. His primary focus is on road actor behavior simulation and sensor simulation.	The Go Evolution, Part I In this talk, I’ll trace the evolution of the main ideas in Deepmind’s Go playing ML work you’ve surely heard of. We’ll start with the original model free AlphaGo paper and work our way through to the recent model based MuZero. [AlphaGo], [AlphaGo Zero], [Alpha Zero], [MuZero] [Slides] [Recording]
2020.04.03	Alyssa Dayan	Mode-Adaptive Neural Networks for Quadruped Motion Control [Slides]
2020.03.27	Michela Paganini	Empirical Observations in Pruned Networks & Tools for Reproducible Pruning Research
2020.03.20	Rapha Gontijo Lopes	Affinity and Diversity: Quantifying Mechanisms of Data Augmentation [Slides] [Recording]
2020.03.13	Ian Thompson	A Good View Is All You Need: Deep InfoMax (DIM) and Augmented Multiscale Deep InfoMax (AMDIM) [Slides] [Recording]
2020.02.28	Ashley Edwards	Estimating Q(s, s’) with Deep Deterministic Dynamics Gradients [Slides]
2020.02.14	Xinchen Yan	Conditional generative modeling and adversarial learning
2020.02.07	Yaroslav Bulatov	einsum is all you need [Slides] [Recording]
2020.01.31	Rosanne Liu	Selective Brain Damage: Measuring the Disparate Impact of Model Pruning
2020.01.24	Jeff Coggshall	ReMixMatch and FixMatch
2020.01.17	Rosanne Liu	Improving sample diversity of a pre-trained, class-conditional GAN by changing its class embeddings [Slides] [Recording]
2020.01.10	Zhuoyuan Chen	Why Build an Assistant in Minecraft?
2019.11.22	Rosanne Liu	On the “steerability” of generative adversarialnetworks [Slides] [Recording]
2019.11.15	Polina Binder	Learning Deep Sigmoid Belief Networks with Data Augmentation
2019.11.08	Sanyam Kapoor	Policy Search & Planning: Unifying Connections [1][2]
2019.11.01	Chris Olah	Zoom in: Features and circuits as the basic unit of neural networks
2019.10.25	Renjie Liao	Efficient Graph Generation with Graph Recurrent Attention Networks
2019.10.18	Nitish Shirish Keskar, Bryan McCann	CTRL: A Conditional Transformer Language Model for Controllable Generation
2019.10.11	Subutai Ahmad	Sparsity in the neocortex, and its implications for machine learning
2019.10.04	Eli Bingham	Multiple Causes: A Causal Graphical View
2019.09.27	Xinyu Hu	Learning Representations for Counterfactual Inference
2019.09.04	Jonathan Frankle	The Latest Updates on the Lottery Ticket Hypothesis
2019.08.23	Ankit Jain	Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems [Slides]
2019.08.16	Jiale Zhi	Meta-Learning Neural Bloom Filters
2019.08.16	Ted Moskovitz	Lookahead Optimizer: k steps forward, 1 step back
2019.07.26	Rui Wang	Off-Policy Evaluation for Contextual Bandits and RL [1][2][3][4]
2019.07.19	Rosanne Liu	Weight Agnostic Neural Networks [Slides] [Recording]
2019.07.12	Joost Huizinga	A Distributional Perspective on Reinforcement Learning
2019.06.28	Ashley Edwards	[ICML Preview] Learning Values and Policies from Observation [1][2]
2019.06.21	Stanislav Fořt	[ICML Preview] Large Scale Structure of Neural Network Loss Landscapes
2019.06.07	Joey Bose	[ICML Preview] Compositional Fairness Constraints for Graph Embeddings
2019.05.31	Yulun Li	IntentNet: Learning to Predict Intention from Raw Sensor Data
2019.05.24	Thomas Miconi, Rosanne Liu, Janice Lan	ICLR Recap, cont.
2019.05.17	Aditya Rawal, Jason Yosinski	ICLR Recap
2019.04.26	JP Chen	3D-Aware Scene Manipulation via Inverse Graphics [Slides]
2019.04.19	Felipe Petroski Such	Relational Deep Reinforcement Learning
2019.04.12	Piero Molino, Jason Yosinski	Open mic
2019.04.05	Joel Lehman	The copycat project: A model of mental fluidity and analogy-making
2019.03.29	Rosanne Liu	Non-local Neural Networks [Slides]
2019.03.22	Yariv Sadan	Learning deep representations by mutual information estimation and maximization [Slides]
2019.03.15	Chandra Khatri	Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
2019.03.01	Nikhil Dev Deshmudre	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019.02.22	Vashisht Madhavan	Neural Turing Machines
2019.02.15	Open discussion	GPT-2
2019.02.08	Adrien Ecoffet	HyperNetworks
2019.02.01	Jiale Zhi	Non-delusional Q-learning and value iteration
2019.01.25	Yulun Li	Relational Recurrent Neural Networks
2019.01.18	Rui Wang	Neural Ordinary Differential Equations
2019.01.11	Jonathan Simon	Generating Humorous Portmanteaus using Word Embeddings [1][2] [Slides]
2018.12.21	Christian Perez	Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles [Slides]
2018.12.14	Alexandros Papangelis	Two trends in dialog [1][2]
2018.10.26	Aditya Rawal	Stochastic Weight Averaging [Slides]
2018.10.12	Mahdi Namazifar	Troubling Trends in Machine Learning Scholarship
2018.09.28	Yariv Sadan	MINE: Mutual Information Neural Estimation [Slides]
2018.09.21	Jan-Matthis Lueckmann	Glow and RealNVP [Slides]
2018.09.14	Jane Hung	The YOLO series: v1, v2, v3
2018.09.07	Rosanne Liu	Pooling is Neither Necessary nor Sufficient for Appropriate Deformation Stability in CNNs [Slides]
2018.08.31	Alican Bozkur	Multimodal Unsupervised Image-to-Image Translation [Slides]
2018.08.24	Janice Lan	The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks [Slides]
2018.08.17	Yariv Sadan	Opening the black box of Deep Neural Networks via Information [Slides]
2018.08.10	Joost Huizinga	Learning to Reinforcement Learn, and RL ²: Fast Reinforcement Learning via Slow Reinforcement Learning [Slides]
2018.08.03	JP Chen	Deep Convolutional Inverse Graphics Network [Slides]
2018.07.27	Lei Shu	Attention is all you need [Slides]
2018.07.06	Neeraj Pradhan	Auto encoding Variational Bayes, and ELBO
2018.06.29	Ankit Jain	Dynamic Routing Between Capsules [Slides]
2018.06.22	Xinyu Hu	Self Normalizing Neural Networks [Slides]
2018.06.15	John Sears	The Decline and Fall of Adam: [1][2] [Slides]
2018.06.08	Alex Gajewski	GANs, etc. [1][2] [Slides]
2018.06.01	Jason Yosinski	Sensitivity and Generalization in Neural Networks: an Empirical Study [Slides]

What it is

“A super influential reading group that has achieved cult-like status.” —John Sears

The best thing about it is the group of people that it enables to connect—seriously, the smartest and kindest researchers that I feel so lucky to have known and have worked with.

FAQ

Wow you have scrolled all the way down here and are still reading?? Ok! Here’s more text about the scope and vision of this reading group.

Q: Why aren’t talks recorded?
A: First and foremost, I want to create a safe, intimate and cozy space for all, where we can ask stupid questions, and hold honest discussions without much filtering; exposing and leaving everything you say permanently on the internet just won’t serve the purpose. Second, there are so much recorded content out there these days; I just don’t feel like contributing to making the world even more crowded and overwhelming than it already is. And honestly, how often do you actually watch the things you said to yourself that you’re gonna watch? Third, I value high-quality, in-person communications way more than massive scaling and popularizing. I almost love the feeling of urgency—the feeling that if you miss it, you’d actually miss it—only real-time permits. Like everything else in your real life.
Q: What was the initial idea of organizing a reading group like this?
A: It started with the rather selfish idea that I wanted to know about papers that I don’t have time to read, and learn about topics my individual intelligence limits me from fully understanding. Secondly, I love presentations, especiall y when they are well done. Besides, I enjoy being around people that are smarter and more knowledgable than me, faster than me working out twelve math equations on one slide, braver than me to ask stupid questions, and more patient than me answering them, as well as those who appreciate high-quality presentations as much as I do.
Q: How much work is it for you?
A: I never travel on Fridays.
Q: Where do you see it going?
A: I see it living, perhaps longer than I had initially imagined. I aslo can see it dying, in a (supposedly better) world where DLCT no longer serves a purpose. But either way I see it having a lasting effect.

For one, I envision building a community where people work hard to tell science stories well. Each paper is a story. A great paper, apart from solid results and technical and scientific advances, stands out particularly in the way it tells the story. I hope we all value storytelling and talk-giving slightly more than we do now. This ties to an eventual wish that scientific writing moves towards being lucid and understandable, and even entertaining, through which a broader reach of science can be achieved. This reading group, where you are allowed to practise telling stories, along with many other skills often underrated in one’s scientific persuit, is a start.

Here is how I see different levels of storytelling, in the format of an one-hour presentation, could happen in this group.

You can give a Level 0 talk, which is going through someone else’s paper—the storyline is already there. This is perhaps the most basic and involves the least work: you just need to understand it and retell it to others. (I assume as a researcher you already read papers, and this additional work of making it into a presentation would not be too much of an overhead and would only help you understand it better yourself.) And best of all, when the audience asks hard questions, you can just say “I don’t know—not my work.”

A Level 1 talk, could mean presenting one of your own papers. The bar is higher because you are expected to know every detail of the project, but also lower because you probably already do. And a good background coverage to lead to the exact problem and idea always helps. If well made, this can become a mini-tutorial on the topic that your paper is about.

Then we have Level 2 talks, which are usually a topic formed by understanding a field (however small it is) thoroughly well, and having in mind a hierarchical chart or spiderweb of a number of fields leading to that particular one. You might be citing multiple papers, drawing connections and coming up with conclusions that are mainly your own.
Q: Do you have a high bar for talks given there?
A: Yes I do. But I also know we all have to start somewhere. And I myself was a horrible presenter not too long ago (likely still am). But we all get better.