gpt-5.1-codex-max was just added to api after having been introduced in codex only 2 weeks ago. i just benchmarked it and it scored better than gpt-5.1-codex.
+ +codex alone scores 7/11 D but max scores 8/11 C-.
+ +still worse than gemini 3 pro and claude opus 4.5 and deepseek v3.2 but on par with Claude sonnet 4.5.
+ +This shows even with this release, how far behind openai is and why they 'declared code red'. the rumors of the upcoming model by openai codenamed garlic next week is very anticipated now.
+