Systems will always fail. The work is making failure survivable.

For fifteen years I chased the dream of systems that couldn't break. They broke anyway; at 2am, under load, randomly. So the question I'm sitting with these days is: when this fails, how fast can we see it, name it, and bring it back? The long journey proved to me that reliability doesn't mean the absence of failure. On the contrary it means how gracefully you carry it.

Selected Work

Where the failure became the design.

One system, told honestly—origins, the bottleneck, the 2am test, and what I'd change.

Case Study · 2018–2022

Statsbomb: Real-Time Data Collection

How do you let thousands of collectors capture a live match concurrently without choosing between speed and correctness? We built it from week-one sketches with Ali to the collection infrastructure Hudl bought in 2024.

Real-Time Data Collection State Machines Stream Processing Distributed Teams

Writing

Questions worth sitting with.

All writing →

What does failure look like in your systems?

If you're wrestling with reliability, resilience, or the architecture underneath both, I'd like to hear the shape of it. No pitch—just the question first.

Start a Conversation