LLMSnare Arena

Follow the live benchmark timeline and the recent benchmark changes in one place.

[ BENCHMARK // LIVE ARENA ]

Live benchmark

See the ongoing benchmark results for the mainstream models.

Normalized Scores

[ UPDATES // RECENT CHANGES ]

Recent changes that affect how this benchmark should be read.

2026-05-20 18:30 JST

Removed Grok 4.1; added Gemini 3.5 Flash and GPT 5.5.

2026-05-07 11:30 JST

Added DeepSeek V4 Pro and DeepSeek V4 Flash.

2026-04-19 23:00 UTC

Changed the update cadence to every 3 hours and removed Claude Sonnet 4.5 and Claude Opus 4.5.

2026-04-17 02:00 UTC

Added Claude Opus 4.7, removed OpenAI GPT 4.1 and Claude Haiku 4.5, added the new search_text tool, and raised the difficulty.

2026-04-12

Added two models: Google’s Gemma 4 31B and Xiaomi’s Mimo v2 Pro.

2026-04-11

Raised the difficulty because too many models were hitting full score.

2026-04-10

Launched the live LLMSnare benchmark.