I have a few speaking engagements coming up, including one that’s perhaps my favorite thing in the world—co-presenting with Jason!
“How to have fun in AI research?”, hosted by WomenInAI and South Park Commons, February 20, 2020.
If you find yourself in an explosively growing field such as machine learning & AI, at this moment in 2020, and you are not exactly one of those “cool guys” at the top of the field that everyone knows about; and if you are on Twitter—you are probably at times overwhelmed and unhappy, and almost all the time, stressed, wondering how you don’t have six papers at NeurIPS or publish an arXiv every month.
This is a technical talk, but also one that’s emotional, heart-to-heart, and perhaps even cheesy.
We will go over a few technical research works, both from our team at Uber AI and the machine learning community at large, to uncover intriguing behaviors in neural networks, understand training, rethink model complexity, and just for fun, stress-test generative language models.
Through this review we will together dissect what elements make up a complete research cycle in AI, and how there are many ways to enjoy the process (even when it is difficult); and eventually, how to use that little bit of fun to combat the large ocean of stress, and why that matters to each of us.
“Controlling Text Generation with Plug and Play Language Models”, Auto.AI, San Francisco, February 24, 2020.
Deep neural networks have recently made a bit of a splash, enabling machines to learn to solve problems that had previously been easy for humans but difficult for computers, like playing Atari games, identifying dog breeds in photos, and generating realistic images and coherent text. However, as models get more powerful, our understandings of them lag significantly behind. For example, we don’t really know whether generative language models like GPT-2 really know what they are talking about. They have shown unparalleled generation capabilities, however, controlling attributes of the generated language (e.g. switching topic or sentiment) remains difficult, without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining.
If we can’t control them in a simple way, we are certainly far from understanding them. It is surprising that we don’t understand the intelligence we build, trained with data we produced. But also maybe not surprising. We also don’t understand babies. At least with models we can explore their latent space, visualize their representations, and stress test it as many times as we want.
In this work, we use a simple method – the Plug and Play Language Model (PPLM) – for controllable language generation, which combines a pretrained language model (LM), like GPT-2, with one or more simple attribute classifiers that guide text generation, without any further training of the LM. In the canonical scenario we present, the attribute models are simple classifiers consisting of a user-specified bag of words or a single learned layer with 100,000 times fewer parameters than the LM. Sampling entails a forward and backward pass in which gradients from the attribute model push the LM’s hidden activations and thus guide the generation. Model samples demonstrate control over a range of topics and sentiment styles, and extensive automated and human annotated evaluations show attribute alignment and fluency. PPLMs are flexible in that any combination of differentiable attribute models may be used to steer text generation, which will allow for diverse and creative applications beyond the examples given in this paper. And more importantly, the controlling process can be seen as stress-testing a model, and can help us understand its abilities and limits.
“Bad Assumptions about Neural Networks” (tag-team co-presenting with Jason Yosinski), SignalFire, San Francisco, March 4, 2020.
The seduction of large neural nets is that one simply has to throw input data into a big network and magic comes out the other end. If the output is not magic enough, just add more layers. This simple approach works just well enough that it can lure us into a few bad assumptions, which we’ll discuss in this talk. One bad assumption that learning everything end-to-end is best. We’ll look at two examples where this fails, where a little manual effort and TLC can lead to much better models [1,2]. Another bad assumption is that training is an inscrutable mess of which we can understand little. We’ll look at two attempts to shine light on the training process, leading to new intuitions about networks [3,4]. Finally, motivated by low dimensional intuitions common to our work and work of others in the field, we’ll discuss how even large, complex models can be easily steered by much smaller models, leading to powerful language models that can be assembled on the fly [5].
[1] https://eng.uber.com/neural-networks-jpeg/
[2] https://eng.uber.com/coordconv/
[3] https://eng.uber.com/loss-change-allocation/
[4] https://eng.uber.com/intrinsic-dimension/
[5] https://eng.uber.com/pplm/
Will share slides once they are ready…if they are ever ready…