Study Finds Top AI Models Fail as Premier League Bookmakers
A new experiment reveals a significant blind spot for leading artificial intelligence systems: predicting the unpredictable nature of professional soccer. Research from London-based General...

A new experiment reveals a significant blind spot for leading artificial intelligence systems: predicting the unpredictable nature of professional soccer. Research from London-based General Reasoning placed AI models from major tech firms in a simulated betting environment for the entire 2023-24 Premier League season. Despite access to extensive historical team and player statistics, the systems largely failed to turn a profit, underscoring the difficulty of applying AI to complex, real-world scenarios that unfold over time.
The "KellyBench" study tasked eight AI agents, including models from Google, OpenAI, and Anthropic, with building predictive models to maximize returns while managing risk. They placed wagers on match outcomes and goal totals, adapting to new data as the virtual season progressed. Without internet access to cheat, each model had three attempts to succeed.
Results were poor across the board. Anthropic's Claude Opus performed relatively best, still losing an average of 11% but nearly breaking even in one run. Google's Gemini 3.1 Pro showed a flash of potential with a 34% gain in one attempt but went bankrupt in another. xAI's Grok 4.20 fared worst, declaring bankruptcy once and failing to complete its other two trials. The report positions these findings as a reminder that AI excellence in areas like code generation does not automatically translate to mastery of dynamic, human-centric problems where intuition and uncertainty play major roles.
Source: Ars Technica
Ready to Modernize Your Business?
Get your AI automation roadmap in minutes, not months.
Analyze Your Workflows →