Refactor: Remove benchmark statement, graph, and ranking

This commit is contained in:
2025-12-05 09:41:30 -08:00
parent de7b53acb2
commit 3d00098530

View File

@@ -217,7 +217,7 @@
</div> </div>
<div class="content"> <div class="content">
<p>gpt5.1codexmax just got added to API after having been added to Codex 2 weeks ago. I benchmarked it. Here's what I found.</p> <p>gpt5.1codexmax just got added to API after having been added to Codex 2 weeks ago.</p>
<div class="score-card"> <div class="score-card">
<div class="model"> <div class="model">
@@ -243,59 +243,11 @@
<div class="grade d">7/11 D</div> <div class="grade d">7/11 D</div>
</div> </div>
<div class="visual">
<h2>Where It Lands</h2>
<div class="bars">
<div class="bar" style="height: 100%">
<div class="bar-label">Gemini 3 Pro</div>
</div>
<div class="bar" style="height: 95%">
<div class="bar-label">Claude Opus 4.5</div>
</div>
<div class="bar" style="height: 90%">
<div class="bar-label">DeepSeek v3.2</div>
</div>
<div class="bar" style="height: 73%">
<div class="bar-label">GPT5.1CodexMax</div>
</div>
<div class="bar" style="height: 73%">
<div class="bar-label">Claude Sonnet 4.5</div>
</div>
<div class="bar" style="height: 64%">
<div class="bar-label">GPT5.1Codex</div>
</div>
</div>
</div>
<div class="insight"> <div class="insight">
<h3>The Takeaway</h3> <h3>The Takeaway</h3>
<p>Max scores <span class="highlight">one point better</span> than regular Codex. That's something. But it's still <span class="highlight">worse than Gemini 3 Pro</span>, Claude Opus 4.5, and DeepSeek v3.2. It's only on par with Claude Sonnet 4.5.</p> <p>Max scores <span class="highlight">one point better</span> than regular Codex. That's something. But it's still <span class="highlight">worse than Gemini 3 Pro</span>, Claude Opus 4.5, and DeepSeek v3.2. It's only on par with Claude Sonnet 4.5.</p>
</div> </div>
<div class="ranking">
<h2>Current Lynchmark Ranking</h2>
<div class="rank-item rank-1">
<div class="rank">1</div>
<div>Google Gemini 3 Pro (Temperature: 0.35)</div>
</div>
<div class="rank-item rank-2">
<div class="rank">2</div>
<div>Anthropic Claude Opus 4.5</div>
</div>
<div class="rank-item rank-3">
<div class="rank">3</div>
<div>DeepSeekv3.2</div>
</div>
<div class="rank-item">
<div class="rank">4</div>
<div>GPT5.1CodexMax <span style="color: #666; font-size: 12px;">(new)</span></div>
</div>
<div class="rank-item">
<div class="rank">5</div>
<div>Claude Sonnet 4.5</div>
</div>
</div>
<div class="callout"> <div class="callout">
<strong>The reality check:</strong> Even with this release, OpenAI is still far behind. This shows exactly why they declared "code red." The gap is real. They're not closing it fast enough. <strong>The reality check:</strong> Even with this release, OpenAI is still far behind. This shows exactly why they declared "code red." The gap is real. They're not closing it fast enough.
</div> </div>