Refactor: Remove benchmark statement, graph, and ranking

2026-03-17 03:11:01 +00:00 · 2025-12-05 09:41:30 -08:00
parent de7b53acb2
commit 3d00098530
1 changed files with 1 additions and 49 deletions
--- a/newsletter/issue001.html
+++ b/newsletter/issue001.html
@@ -217,7 +217,7 @@
    </div>
    <div class="content">
-      <p>gpt‑5.1‑codex‑max just got added to API after having been added to Codex 2 weeks ago. I benchmarked it. Here's what I found.</p>
+      <p>gpt‑5.1‑codex‑max just got added to API after having been added to Codex 2 weeks ago.</p>
      <div class="score-card">
        <div class="model">
@@ -243,59 +243,11 @@
        <div class="grade d">7/11 D</div>
      </div>
      <div class="visual">
        <h2>Where It Lands</h2>
        <div class="bars">
          <div class="bar" style="height: 100%">
            <div class="bar-label">Gemini 3 Pro</div>
          </div>
          <div class="bar" style="height: 95%">
            <div class="bar-label">Claude Opus 4.5</div>
          </div>
          <div class="bar" style="height: 90%">
            <div class="bar-label">DeepSeek v3.2</div>
          </div>
          <div class="bar" style="height: 73%">
            <div class="bar-label">GPT‑5.1‑Codex‑Max</div>
          </div>
          <div class="bar" style="height: 73%">
            <div class="bar-label">Claude Sonnet 4.5</div>
          </div>
          <div class="bar" style="height: 64%">
            <div class="bar-label">GPT‑5.1‑Codex</div>
          </div>
        </div>
      </div>
      <div class="insight">
        <h3>The Takeaway</h3>
        <p>Max scores <span class="highlight">one point better</span> than regular Codex. That's something. But it's still <span class="highlight">worse than Gemini 3 Pro</span>, Claude Opus 4.5, and DeepSeek v3.2. It's only on par with Claude Sonnet 4.5.</p>
      </div>
      <div class="ranking">
        <h2>Current Lynchmark Ranking</h2>
        <div class="rank-item rank-1">
          <div class="rank">1</div>
          <div>Google Gemini 3 Pro (Temperature: 0.35)</div>
        </div>
        <div class="rank-item rank-2">
          <div class="rank">2</div>
          <div>Anthropic Claude Opus 4.5</div>
        </div>
        <div class="rank-item rank-3">
          <div class="rank">3</div>
          <div>DeepSeek‑v3.2</div>
        </div>
        <div class="rank-item">
          <div class="rank">4</div>
          <div>GPT‑5.1‑Codex‑Max <span style="color: #666; font-size: 12px;">(new)</span></div>
        </div>
        <div class="rank-item">
          <div class="rank">5</div>
          <div>Claude Sonnet 4.5</div>
        </div>
      </div>
      <div class="callout">
        <strong>The reality check:</strong> Even with this release, OpenAI is still far behind. This shows exactly why they declared "code red." The gap is real. They're not closing it fast enough.
      </div>