multipleof4/lynchmark

mirror of https://github.com/multipleof4/lynchmark.git synced 2026-03-17 03:11:01 +00:00

Go to file

github-actions[bot] 3264242b13 Docs: Update benchmark results

2025-10-13 12:37:53 +00:00

.github/workflows

Update benchmark.yml

2025-10-13 05:30:51 -07:00

Feat: Add benchmark runner script

2025-10-13 05:29:31 -07:00

Feat: Add Matrix Determinant calculation test

2025-10-13 05:29:27 -07:00

package.json

Feat: Add initial project dependencies

2025-10-13 05:27:58 -07:00

README.md

Docs: Update benchmark results

2025-10-13 12:37:53 +00:00

README.md

LLM Algorithmic Benchmark

This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models.

The tests are run automatically via GitHub Actions, and the results are updated in this README.

Models Under Test

The following models are included in the benchmark run.

google/gemini-2.5-pro anthropic/claude-sonnet-4.5 openai/gpt-5-codex

Benchmark Results

The table below shows the pass/fail status for each model on each test.

Model	1_dijkstra	2_convex_hull	3_lis	4_determinant
google/gemini-2.5-pro	❌ Fail	❌ Fail	❌ Fail	❌ Fail
anthropic/claude-sonnet-4.5	❌ Fail	❌ Fail	❌ Fail	❌ Fail
openai/gpt-5-codex	❌ Fail	❌ Fail	❌ Fail	❌ Fail