Login
Sign Up
Data compiled by Woofun AI shows that Artificial Analysis has overhauled its AI Intelligence Index to prioritize autonomous planning and complex task resolution over simple instruction following. The revised methodology introduces high-difficulty scenarios, such as simulated bank customer service interactions, with the primary metric focusing on the cost and time required to complete tasks successfully.
In the latest rankings, Claude Opus 4.8 secured the top position among available models with a score of 56, narrowly edging out GPT-5.5 at 55 points.
However, a stark cost divergence emerged: executing identical tasks with Claude Opus 4.8 incurred a fee of $1.78, whereas DeepSeek V4 Pro completed the same work for just $0.04. This equates to a 44-fold cost premium for Claude. Performance speeds also varied significantly, with xAI Grok 4.3 finishing in 1.5 minutes compared to Claude Sonnet 4.6's 13.5 minutes. The updated GDPval-AA test now constitutes 20% of the total evaluation, raising the human benchmark to 1000 and extending conversation limits to 250 rounds.