Recent Posts

You and Your Big Heart Will Win

1 June 2024 workish

For the last 6 years I’ve dabbled in many things, but only one thing consistently. I’ve been running this weekly event, DLCT, rarely missing any week, for 6 years straight. The format of DLCT is simply, “talks”: a speaker comes to talk about a deep learning paper (usually one of their own), and engage with the audience (of a size between 40 to 80 on average) for an hour.

I Hope You Still Try

15 May 2024 workish

Hello, future you.

Two Years of MLC: My Protests

13 June 2022 workish, lifeish

A little over a year ago, I wrote a 6000-word retrospective, A Year of MLC: Selfish Takes Only, reflecting on building ML Collective, the non-profit and non-traditional researchers community, for a full year.

Recent Publications

2026 The Topological Trouble With Transformers
TL;DR arXiv PDF
2025 Enhancing LLM Planning Capabilities through Intrinsic Self-Critique
TL;DR arXiv PDF
2024 TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
TL;DR ICCV 2025 arXiv PDF
2024 Logit Scaling for Out-of-Distribution Detection
TL;DR arXiv PDF
2024 Training language models on the knowledge graph: Insights on hallucinations and their detectability
TL;DR COLM 2024 arXiv PDF Twitter thread
2024 Improve mathematical reasoning in language models by automated process supervision
TL;DR arXiv PDF Twitter thread
2024 Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation
TL;DR arXiv PDF
2024 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
TL;DR arXiv PDF Twitter thread 1.5 Pro Update
2023 Beyond human data: Scaling self-training for problem-solving with language models
TL;DR TMLR arXiv PDF Twitter thread
2023 Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?"
TL;DR arXiv PDF
2022 Character-Aware Models Improve Visual Text Rendering
TL;DR ACL 2023 arXiv PDF Twitter thread
2022 Extremely Simple Activation Shaping for Out-of-Distribution Detection
TL;DR ICLR 2023 arXiv PDF Website Video Code Twitter thread