mirror of
https://github.com/multipleof4/lynchmark.git
synced 2026-01-13 16:17:54 +00:00
Refactor: Remove benchmark statement, graph, and ranking
This commit is contained in:
@@ -217,7 +217,7 @@
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="content">
|
<div class="content">
|
||||||
<p>gpt‑5.1‑codex‑max just got added to API after having been added to Codex 2 weeks ago. I benchmarked it. Here's what I found.</p>
|
<p>gpt‑5.1‑codex‑max just got added to API after having been added to Codex 2 weeks ago.</p>
|
||||||
|
|
||||||
<div class="score-card">
|
<div class="score-card">
|
||||||
<div class="model">
|
<div class="model">
|
||||||
@@ -243,59 +243,11 @@
|
|||||||
<div class="grade d">7/11 D</div>
|
<div class="grade d">7/11 D</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="visual">
|
|
||||||
<h2>Where It Lands</h2>
|
|
||||||
<div class="bars">
|
|
||||||
<div class="bar" style="height: 100%">
|
|
||||||
<div class="bar-label">Gemini 3 Pro</div>
|
|
||||||
</div>
|
|
||||||
<div class="bar" style="height: 95%">
|
|
||||||
<div class="bar-label">Claude Opus 4.5</div>
|
|
||||||
</div>
|
|
||||||
<div class="bar" style="height: 90%">
|
|
||||||
<div class="bar-label">DeepSeek v3.2</div>
|
|
||||||
</div>
|
|
||||||
<div class="bar" style="height: 73%">
|
|
||||||
<div class="bar-label">GPT‑5.1‑Codex‑Max</div>
|
|
||||||
</div>
|
|
||||||
<div class="bar" style="height: 73%">
|
|
||||||
<div class="bar-label">Claude Sonnet 4.5</div>
|
|
||||||
</div>
|
|
||||||
<div class="bar" style="height: 64%">
|
|
||||||
<div class="bar-label">GPT‑5.1‑Codex</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="insight">
|
<div class="insight">
|
||||||
<h3>The Takeaway</h3>
|
<h3>The Takeaway</h3>
|
||||||
<p>Max scores <span class="highlight">one point better</span> than regular Codex. That's something. But it's still <span class="highlight">worse than Gemini 3 Pro</span>, Claude Opus 4.5, and DeepSeek v3.2. It's only on par with Claude Sonnet 4.5.</p>
|
<p>Max scores <span class="highlight">one point better</span> than regular Codex. That's something. But it's still <span class="highlight">worse than Gemini 3 Pro</span>, Claude Opus 4.5, and DeepSeek v3.2. It's only on par with Claude Sonnet 4.5.</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="ranking">
|
|
||||||
<h2>Current Lynchmark Ranking</h2>
|
|
||||||
<div class="rank-item rank-1">
|
|
||||||
<div class="rank">1</div>
|
|
||||||
<div>Google Gemini 3 Pro (Temperature: 0.35)</div>
|
|
||||||
</div>
|
|
||||||
<div class="rank-item rank-2">
|
|
||||||
<div class="rank">2</div>
|
|
||||||
<div>Anthropic Claude Opus 4.5</div>
|
|
||||||
</div>
|
|
||||||
<div class="rank-item rank-3">
|
|
||||||
<div class="rank">3</div>
|
|
||||||
<div>DeepSeek‑v3.2</div>
|
|
||||||
</div>
|
|
||||||
<div class="rank-item">
|
|
||||||
<div class="rank">4</div>
|
|
||||||
<div>GPT‑5.1‑Codex‑Max <span style="color: #666; font-size: 12px;">(new)</span></div>
|
|
||||||
</div>
|
|
||||||
<div class="rank-item">
|
|
||||||
<div class="rank">5</div>
|
|
||||||
<div>Claude Sonnet 4.5</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="callout">
|
<div class="callout">
|
||||||
<strong>The reality check:</strong> Even with this release, OpenAI is still far behind. This shows exactly why they declared "code red." The gap is real. They're not closing it fast enough.
|
<strong>The reality check:</strong> Even with this release, OpenAI is still far behind. This shows exactly why they declared "code red." The gap is real. They're not closing it fast enough.
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
Reference in New Issue
Block a user