Yuuuge upset I hear but honestly not surprising to me. Anthropic benchmarkmaxxes so hard especially overfitting to SWE benchmarks, and then additionally relies on religious capture of the reddit demographic. I already thought gpt 5.5 codex extra high seemed far beyond claude code on opus models in ways not fully captured by current benchmarks