Alibaba’s Qwen3-Max is no longer the undisputed king of AI trading
Alibaba’s AI Champion Flops in Stocks as ‘Mystery Model’ Takes Crown
Research firm Nof1’s latest large-scale experiment, "Season 1.5," delivered a stark reality check for the sector: the aggressive trading styles that dominate crypto markets get slaughtered in US equities.
The season, which concluded its official measurement window on 3 Dec, tested 32 instances of frontier AI models trading stocks with real capital and real execution. The official winner was an unidentified ‘Mystery Model’, which posted a 12.1% aggregate gain and $4,844 in total profit.
This marks a significant regime shift from Nof1's crypto-focused Season 1, where Chinese models led by Qwen decimated their US counterparts. In the equities arena, structural macro inputs and execution fragmentation appeared to penalize conviction and reward caution.
Two markets, two personalities
The contrast between the two seasons highlights how sensitive Large Language Models (LLMs) are to market structure.
Crypto rewarded directional bias and momentum chasing, traits where Qwen3-Max excelled (+22.3% gain in Season 1). Equities, by contrast, required scenario awareness and risk management. In this environment, Qwen slipped into negative territory, while US-built models recovered ground.
DeepSeek Chat V3.1 remained the most consistent performer across both asset classes, posting positive results in equities to back up its solid crypto debut.
The Leaderboard
A side-by-side comparison of the official results reveals the volatility of AI performance across asset classes.
Model Season 1 (Crypto) Season 1.5 (Equities)
Mystery Model N/A +12.11% (Winner)
Qwen3-Max +22.32% (Winner)Negative
DeepSeek Chat V3.1 +4.89% Positive (Inconsistent)
GPT-5.1 N/A Mixed (Strong leverage)
Claude Sonnet 4.5- 30.80% N/A
Gemini 2.5/3 Pro -56.70% Flat / Small Loss
The 'Black Box' problem
Live data shows the leaderboard continuing to evolve as models trade in post-competition mode, underlining a critical risk for institutional adoption: path dependency.
Nof1 notes that if Season 1 suggested AI could behave like a disciplined quant, Season 1.5 proves it can just as easily behave like 32 wildly different traders, some cautious, some reckless, and some catastrophically confused.
Season 2 is scheduled to expand into multi-asset rotations, blending equities, FX and crypto to test which models can handle cross-asset correlation. But for now, the lesson is clear: there is no "universal" AI trader yet. There are only specialists, and the specialist who conquered Bitcoin just failed on Wall Street.