From 1ea8cb58523f51687dc12578c5ef0f39f425eb0e Mon Sep 17 00:00:00 2001 From: multipleof4 Date: Mon, 13 Oct 2025 10:33:35 -0700 Subject: [PATCH] Refactor: Point to live results page in README --- README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..3340e3e --- /dev/null +++ b/README.md @@ -0,0 +1,29 @@ +# LLM Algorithmic Benchmark + +This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models. + +The tests are run automatically via GitHub Actions, and the results are updated in this README. + +## Configuration + +Set the percentage of tests to run during the benchmark. 100% runs all tests. + + +RUN_PERCENTAGE: 25 + + +## Models Under Test + +The following models are included in the benchmark run. + + +google/gemini-2.5-pro +anthropic/claude-sonnet-4.5 +openai/gpt-5-codex + + +## Benchmark Results + +Live benchmark results, including pass/fail status and code generation time, are available on our [results page](https://multipleof4.github.io/benchmark/). + +The results are updated automatically via GitHub Actions.