AI Might Be Better at Simulating Than Answering

One of the more interesting things about a new technology is that we often spend the first few years using it to do old jobs faster.

The automobile spent years being described as a horseless carriage. Early television was largely radio with pictures. Even the first personal computers were often treated as expensive typewriters.

It takes time before we stop asking, “How can this help me do what I already do?” and start asking, “What does this make possible that wasn’t practical before?” I sometimes wonder if we are still in that first phase with AI.

Most of the conversations I hear about large language models revolve around generation. Can it write code? Can it create requirements? Can it summarize a meeting? Can it produce a presentation? Can it answer a question?

Those are all useful capabilities. I use them regularly myself, yet the longer I work with these tools, the more I find myself drawn to a different use case entirely.

I have spent years watching people use tools to come up with answers. More recently, I have been finding unexpected value in using AI evolutions of those same tools to interrogate assumptions, explore failure paths, and stress-test ideas before committing resources. The shift sounds subtle, but it changes the conversation completely.

Instead of asking an AI whether a product idea is good, I can ask what happened six months after the product launched and failed.

Instead of asking whether a process change is likely to help, I can ask what warning signs appeared before the initiative went off the rails.

Instead of asking for a list of benefits, I can ask how similar efforts typically collapse under their own weight.

There is a distinction that I think gets lost in many conversations about AI. The answers are often imperfect. Occasionally, they are completely wrong. Yet I have found them surprisingly useful. Not because they predict the future, but because they help explore it.

Critics are quick to point out that large language models cannot see the future, and they are right. On the other hand, advocates are sometimes tempted to treat LLMs as oracles. I think we all know that’s wrong. Between prediction and oracle sits something much more practical.

Simulation.

Every meaningful decision contains uncertainty. A new feature may delight customers but create support burdens. An organizational change may improve coordination while introducing new bottlenecks. A strategic initiative may solve the problem it was designed to address while quietly creating two others.

Historically, exploring those possibilities required a great deal of time and coordination. You gathered experts into conference rooms. You conducted workshops. You debated scenarios. You captured risks. Then, if you were fortunate, someone revisited those risks before the project ended.

Now I can have a surprisingly useful version of that conversation in minutes.

A recent example emerged while I was experimenting with an AI-assisted version of Gary Klein’s premortem technique. Klein’s premise is simple. Imagine that your project has already failed, and then ask what caused the failure. Think: a “Start at the End” workshop that only focuses on the failure portion

Most teams naturally spend their planning effort discussing how they will succeed. An AI simulation stress test reverses the direction of travel entirely. That simulation forces participants to confront uncomfortable possibilities before reality does it for them, and at a level of detail that surprisingly initiates more conversations.

As I worked with the output, I found it pushing further. How do projects like this usually fail? Which of those failure modes is most likely? What evidence would suggest we are heading toward one of them? What would we see three months before failure becomes obvious? At what point should we pivot? At what point should we stop?

What surprised me was not the list of risks. Most experienced leaders can generate a risk list without AI. What surprised me was how quickly the conversation moved from planning to learning. The exercise stopped being about producing a document and started becoming a form of exploration. Every answer generated another question. Every scenario exposed assumptions that had quietly settled into the background. Some assumptions survived the scrutiny. Others did not.

That feels significant because I suspect we are still treating AI primarily as a production tool. We measure its value by how quickly it can create something. A report. A presentation. A design. A backlog item. A block of code.

There is certainly value in that.

But increasingly I find myself wondering whether the larger opportunity lies elsewhere. What if the real advantage is not generating artifacts but reducing the cost of exploration? What if the greatest contribution of AI is allowing us to run dozens of thought experiments before we spend months turning one of them into reality? THAT is exciting to me.

The organizations that benefit most from AI may not be the ones that produce documents faster. They may be the ones that learn faster. The ones that discover hidden assumptions before customers do, recognize weak ideas before they become expensive projects, or identify risks while they are still manageable rather than after they have become headlines.

That is where simulation becomes more valuable than generation. A generated answer often feels complete. A simulation invites another question. In complex environments, the next question is frequently worth more than the first answer.

As a challenge, consider something your team is actively pursuing. It could be a feature, a product, a process change, or even an organizational initiative.

Instead of asking an AI how to make it successful, ask it–

Why did it fail?
Which warning signs appeared first?
Which assumptions proved false?
What did the people involved miss?
What evidence would have convinced them to change course sooner?

You may not agree with every answer. I rarely do. What matters is whether the exercise reveals something worth investigating before the decision becomes expensive.

If you build a useful version of that simulation, consider sharing it. A simple skill, a prompt, or a workflow published in a public repository might help others pressure-test their own assumptions.

We have spent the last few years teaching machines to generate answers.

I am beginning to suspect that one of their most useful talents may be helping us ask better questions about futures that have not happened yet.

Arjay Hinek