Yisong Yue

Machine Learning Professor @ Caltech

About Me

Professor of Computing and Mathematical Sciences at Caltech.

Research Interests: Machine Learning and Artificial Intelligence.

Industry Advising: Asari AI, Cainex, Latitude AI, Lila Sciences, and Tera AI.

ICLR Leadership: Member of ICLR Board. General Chair of ICLR 2025. Senior Program Chair of ICLR 2024.

Yisong Yue headshot

Research Themes

Modeling & Inference. We develop models that learn useful structure from complex data, from representation learning to new architectures to inverse problems.

Reasoning & Self-Improvement. We study how models solve hard problems by searching, checking their work, and improving from feedback, including code generation, LLM search, and programmable agents.

Scientific Discovery. We use agents, foundation models, and closed-loop experiment design to advance discovery in biomedical imaging, neural data, protein engineering, and more.

News & Updates

Mentorship Award
Pinned
thumbnail

I am honored to receive the mentoring award from the Grad Student Advisory Board of Caltech EAS.

SpeeDiff: Scalable Pixel-Anchored End-to-End Latent Diffusion Model
thumbnail

We introduce SpeeDiff, a scalable pixel-anchored end-to-end latent diffusion method that jointly trains the VAE and diffusion model from scratch. SpeeDiff uses a Tweedie Pixel Reconstruction loss to provide pixel-level feedback during diffusion training, preventing latent collapse and enabling efficient transformer-based scaling. SpeeDiff-XL achieves strong ImageNet generation results while training over 140x faster than Vanilla SiT and 61x faster than REPA. [CVPR 2026]

FormulaCode: Evaluating Agentic Optimization on Large Codebases
thumbnail

We introduce FormulaCode, a benchmark for evaluating how well coding agents can optimize large, real-world scientific software repositories. FormulaCode tests agents on realistic performance bottlenecks with expert-written fixes and community-maintained workloads, revealing where current agents still struggle with repository-scale optimization. [ICML 2026]

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer
thumbnail

We introduce an end-to-end approach for autoregressive image generation that learns the visual tokenizer and generator together. By letting generation quality directly shape the tokenizer, the method produces stronger image representations and achieves competitive ImageNet generation results. [ICML 2026 Spotlight]

Krause Synchronization Transformers
thumbnail

We introduce Krause Attention, a principled attention mechanism inspired by bounded-confidence consensus dynamics. Krause Attention replaces similarity-based global aggregation with distance-based, localized, and selectively sparse interactions, promoting structured local synchronization instead of global mixing. We relate this behavior to recent theory modeling Transformer dynamics as interacting particle systems, and show how bounded-confidence interactions naturally moderate attention concentration and alleviate attention sinks. [ICML 2026]

NitroGen: A Foundation Model for Generalist Gaming Agents
thumbnail

We introduce NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. We incorporate three key ingredients: 1) an internet-scale video-action dataset constructed by automatically extracting player actions from publicly available gameplay videos, 2) a multi-game benchmark environment that can measure cross-game generalization, and 3) a unified vision-action model trained with large-scale behavior cloning. [CVPR 2026]

Embodied Learning of Reward for Musculoskeletal Control with Vision Language Models
thumbnail

We introduce Motion from Vision-Language Representation (MoVLR), a framework that uses vision-language models to bridge natural language descriptions and movement control. Rather than relying on handcrafted rewards, MoVLR iteratively refines reward functions with vision-language feedback, enabling high-dimensional musculoskeletal locomotion and manipulation from high-level goals. [L4DC 2026]

Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization
thumbnail

We introduce a systematic evaluation framework for agentic code optimization and use it to study three production-level biomedical imaging pipelines. We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions. Our analysis reveals that common, complex agent architectures are not universally beneficial, leading to a practical roadmap for agent design. [CVPR 2026]