From af7c144294c8286bd8ec1e094466cb4989b010a3 Mon Sep 17 00:00:00 2001 From: multipleof4 Date: Fri, 5 Dec 2025 09:27:32 -0800 Subject: [PATCH] Fix: Rewrite newsletter in first-person 'I' voice with direct tone --- newsletter/gpt-5.1-codex-max-benchmark.html | 315 ++++++++++++++++++++ 1 file changed, 315 insertions(+) create mode 100644 newsletter/gpt-5.1-codex-max-benchmark.html diff --git a/newsletter/gpt-5.1-codex-max-benchmark.html b/newsletter/gpt-5.1-codex-max-benchmark.html new file mode 100644 index 0000000..57981b9 --- /dev/null +++ b/newsletter/gpt-5.1-codex-max-benchmark.html @@ -0,0 +1,315 @@ + + + + + GPT-5.1-Codex-Max Results - Lynchmark + + + +
+
+

GPT‑5.1‑Codex‑Max Drops

+

Lynchmark automated benchmark results

+
+ +
+

GPT‑5.1‑Codex‑Max just got added to the OpenAI API. I benchmarked it the moment it became available. Here's what I found.

+ +
+
+
M
+
+
GPT‑5.1‑Codex‑Max
+
Just released to API
+
+
+
8/11 C‑
+
+ +
versus
+ +
+
+
C
+
+
GPT‑5.1‑Codex
+
Previous version
+
+
+
7/11 D
+
+ +
+

Where It Lands

+
+
+
Gemini 3 Pro
+
+
+
Claude Opus 4.5
+
+
+
DeepSeek v3.2
+
+
+
GPT‑5.1‑Codex‑Max
+
+
+
Claude Sonnet 4.5
+
+
+
GPT‑5.1‑Codex
+
+
+
+ +
+

The Takeaway

+

Max scores one point better than regular Codex. That's something. But it's still worse than Gemini 3 Pro, Claude Opus 4.5, and DeepSeek v3.2. It's only on par with Claude Sonnet 4.5.

+
+ +
+

Current Lynchmark Ranking

+
+
1
+
Google Gemini 3 Pro (Temperature: 0.35)
+
+
+
2
+
Anthropic Claude Opus 4.5
+
+
+
3
+
DeepSeek‑v3.2
+
+
+
4
+
GPT‑5.1‑Codex‑Max (new)
+
+
+
5
+
Claude Sonnet 4.5
+
+
+ +
+ The reality check: Even with this release, OpenAI is still far behind. This shows exactly why they declared "code red." The gap is real. They're not closing it fast enough. +
+ +

What's coming: The rumors say OpenAI's upcoming model (codenamed "Garlic") arrives next week. The pressure is on. The anticipation is building. I'll benchmark it the moment it drops.

+ +

— Lynchmark

+
+ + +
+ +