Attended a great OpenAI talk at Granola last night. The speaker, Wulfie Bain, delivers information with such richness that I found myself blinking like an Android taking it in.

His role is focused on working with startups to get the most of OpenAI’s products. Needless to say, there was some really good stuff in there.

I want to focus in on one of his points: The flywheel opportunity for AI products.

The AI flywheel

Wulfie’s seeing startups create unfair advantage through a new kind of flywheel:

Observability → Evals → Fine-tuning

The logic is: Observability from day one → real usage data → better evals → model advantage –> more users + data to observe.

“Start with observability, and do that from day one. In the background. You’re logging everything. It means that anytime you can then go back… and benefit."

Log everything (inputs, outputs, context, user behaviour) so you can later analyse and benchmark without having to rebuild instrumentation. That includes tracing agent behaviour, detecting cache misuse, and catching unexpected or undesirable outputs in real time.

If you never start logging, or start late, you’ll be limited to the same or synthetic datasets as everyone else.

Evals based on logged data

Correlate model outputs with real user behaviour (bounce rates, engagement). You can then manually grade a small sample of these real interactions to create an eval set. Once you have that grading criteria, you can auto-grade thousands of logged outputs.

“Once you’ve done all of the work for grading evals, you’ve literally done all the work you need for RFT. So you have hundreds of thousands of outputs which have been auto-graded… You can now go from those to an RFT pipeline."

“You want it to be running constantly on your production data. So we can see any regressions that happen. We can see interesting trends as well."

Compounding Advantage

“These top startups who have a bespoke model… they have a better product because of that. They have more volume from their users. That data is used to help fine tune even more… And you have a moat."

“Everyone talks nowadays about data moats from your users. That is going to be a very, very hard thing to replicate in a short timeframe."

Conclusion

Various forms of flywheel have always been the startup dream. If you’re in the business of pursuing VC-level growth, it doesn’t get any better. Let’s see how fast they can go.

bonus round: non-chat AI interfaces

I asked Wulfie a question about whether he had seen any great examples of people moving beyond chat interfaces for AI products.

This is kind of an obsession of mine, that I’ve written about several times. So I was disappointed to find a basic “no” answer.

I still feel this is the biggest opportunity in front of everyone today. One that’s being neglected. Understandably if you want to hop on the current AI wave and get a great outcome fast.

But I can’t think of anything more exciting to work on right now. I am super-interested in asking the questions that might get us to that answer.