From 43d17e9b0f660f9937bbc54c049cefedfe10120f Mon Sep 17 00:00:00 2001 From: multipleof4 Date: Mon, 13 Oct 2025 05:50:08 -0700 Subject: [PATCH] Feat: Add run percentage config & rename file --- README | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 README diff --git a/README b/README new file mode 100644 index 0000000..b74dfb5 --- /dev/null +++ b/README @@ -0,0 +1,35 @@ +# LLM Algorithmic Benchmark + +This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models. + +The tests are run automatically via GitHub Actions, and the results are updated in this README. + +## Configuration + +Set the percentage of tests to run during the benchmark. 100% runs all tests. + + +RUN_PERCENTAGE: 100 + + +## Models Under Test + +The following models are included in the benchmark run. + + +google/gemini-2.5-pro +anthropic/claude-sonnet-4.5 +openai/gpt-5-codex + + +## Benchmark Results + +The table below shows the pass/fail status for each model on each test. + + +| Model | 1_dijkstra | 2_convex_hull | 3_lis | 4_determinant | +| --- | --- | --- | --- | --- | +| google/gemini-2.5-pro | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail | +| anthropic/claude-sonnet-4.5 | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail | +| openai/gpt-5-codex | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail | +