The DeepMind Exit
Misha Laskin led reward modeling for Gemini at DeepMind - the system that teaches Google's most advanced AI to improve at tasks through reinforcement learning. Ioannis Antonoglou co-created AlphaGo, the AI that beat Lee Sedol at Go and changed how the world thinks about machine intelligence.
They left DeepMind to start Reflection AI with one thesis: reinforcement learning + large language models = software engineers that improve themselves.
What Reflection AI Actually Does
They're building "superhuman general agents" that autonomously write, debug, test, and deploy code. The key differentiator is reinforcement learning - the same technology behind AlphaGo - applied to coding. The agents don't just predict what code to write; they learn from outcomes and get better.
Why This is Different
Most coding AI tools use LLMs to predict the next token. Reflection uses reinforcement learning to optimize for working code. The difference: an LLM generates code that looks right. An RL agent generates code that works.
When the people who made AI beat world champions at Go apply the same approach to software engineering, the implication is clear: coding agents will get better with every deployment, just as AlphaGo got better with every game.
The co-creator of AlphaGo thinks reinforcement learning is the key to autonomous software engineering. $2.13B in funding suggests investors agree.
The Breakout Pattern
DeepMind is where the world's best AI researchers go. When they leave to build a company, it means they've seen something inside the lab that the market hasn't priced in yet.
That's the breakout.
Your domain expertise is the moat.
Explore 50 startup ideas for engineers who refuse to compete for shrinking jobs.
Browse 50 Ideas