The Entropy Problem
The most expensive problem in enterprise AI isn’t the model. It’s everything you're feeding it.
Three years ago, I was sitting on a roundtable at ShopTalk when the conversation turned to AI. ChatGPT had launched about four months earlier, and the energy in the room was a particular mix of excitement and uncertainty that I suspect most of us remember. Everyone was trying to figure out what generative AI meant for retail, for marketing, and for the way we work.
I remember saying something that felt, at the time, almost too obvious to be worth saying out loud: the output is only as good as the input, and the prompt is only as good as the data your AI is being trained on.
As I prepare for ShopTalk 2026, I keep returning to that conversation. Not because I was prescient — I’m not, clearly — but because three years of exponential progress in AI capabilities have done remarkably little to change the underlying constraint.
The Acceleration We Expected (and the One We Didn’t)
The technology has evolved faster than most of us expected. We’ve moved from general-purpose chatbots to specialized agents with domain-specific capabilities within a matter of months. From tools that answer questions to systems that execute multi-step workflows autonomously, and quite gracefully. The gap between “impressive demo” and “production-ready application” has narrowed considerably, and it’s only going to shrink further.
What I didn’t fully appreciate in 2023 was the velocity of that acceleration. While we were still laughing at Will Smith eating spaghetti, each capability was unlocking the next. Agents that can reason about code enable agents that can build, QA, and deploy. Agents that can navigate interfaces enable agents that can operate across systems.
What’s become clearer with time is the ceiling on what AI can do for any given organization isn’t primarily a function of model capability. It’s a function of data architecture.
I said it then and I will say it again now:
If your data is shit, if your input is trash, don’t be shocked if your AI can only parse paper from plastic.
Excuse my French.
The Taxonomy Problem
I’ve worked with companies that have over 100,000 SKUs, thousands of brands, hundreds of manufacturers, and a category taxonomy that was largely the brainchild of a long-absent former employee. These systems work, in the sense that products get listed and orders get fulfilled. But they work through institutional knowledge, human pattern-matching based on personal belief systems, and a kind of organizational muscle memory that doesn’t transfer well to automated or autonomous systems.
Ask a new employee to navigate that taxonomy on their own and they’re lost for about 6 months. Ask an AI agent to optimize that catalog — to surface the right products to the right customers, to identify gaps in the assortment, to automate merchandising decisions, what have you — and you’re asking it to reason with a structure that was never designed to be reasoned with by anyone other than the people who built it.
Don’t take it personally; taxonomies evolve organically. Categories that made sense when you had 10,000 SKUs don’t scale cleanly to 50,000 or 100,000. The sub-sub-sub-category that captures an important distinction for one product manager is meaningless noise to another. Entropy is the natural state of any sufficiently large data system.
But the cost of that entropy is changing. What used to be friction — extra clicks, occasional mis-categorizations, the need for tribal knowledge — is becoming a hard ceiling on what’s possible.
Specialized Agents and Their Dependencies
The trajectory of AI development points clearly toward specialization. You know this. I know this. General-purpose models will continue to improve, but the real leverage comes from agents designed for specific domains: merchandising agents, pricing agents, creative development agents, analytics agents. Systems that don’t just know things in general but know how to operate within a particular business context.
A specialized agent that understands the nuances of e-commerce product data is more valuable than a general-purpose model that knows a little about everything. Though the distinction between “specialized model” and “general model acting specialized” is blurring slightly, if we’ve learned anything, it’s that domain expertise matters as much in AI systems as it does in human teams.
But specialization creates a different kind of dependency. A merchandising agent needs clean product data and context and category knowledge to merchandise effectively. A personalization agent needs consistent customer data to personalize meaningfully. An analytics agent needs well-structured event data to generate insights worth acting on. The more capable the agent, the more it exposes the gaps in the data it’s attempting to reason through.
The Boring Work
There’s nothing glamorous about data hygiene. Taxonomy audits don’t make for exciting Linkedin posts. Documentation reviews don’t generate conference buzz. The work of maintaining clean, consistent, well-structured data is precisely the kind of work that gets deprioritized when there are shinier problems to solve, and I’ve witnessed it firsthand. Throughout my career as a marketer, I have been in heated conversations with peers in Engineering who didn’t see the value in custom event parameters or granular profile dimensions or why that should jockey for position on the roadmap against some other sexier feature.
I’ve noticed that the organizations making real progress with AI tend to have something in common: they did the boring work first. Not perfectly — because no one’s data is perfect — but well enough that automated systems can reason through it without constant human intervention.
Clean taxonomies. Consistent product attributes. Regular hygiene processes. Documentation that doesn’t live exclusively in one person’s head. These aren’t AI initiatives, but they’re the foundation that makes AI initiatives possible.
The Question I Keep Asking
As the technology accelerates, the gap between organizations that can leverage it and those that can’t is widening. The limiting factor isn’t access to models; those are increasingly commoditized. It’s not even technical and strategic talent (though that matters). It’s the accumulated decisions about data architecture made over years or decades, and the willingness to revisit those decisions in light of what’s now possible.
So the question I asked in 2023 is the same question I’m asking now, with more urgency: Is your data ready for what’s coming next?
I imagine for most organizations, the answer is probably not. But the organizations that start addressing that gap now will have a meaningful advantage over those that wait until the bottleneck becomes impossible to ignore.







Excellent analisis! Thanks for this perspective. I wholeheartedly agree: output is only as good as its input. Despite the incredible velocity of AI, this fundamental constraint remains crucial. It's something we cannot forget.