mirror of
https://github.com/multipleof4/lynchmark.git
synced 2026-01-14 00:27:55 +00:00
Feat: Add run percentage config & rename file
This commit is contained in:
35
README
Normal file
35
README
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
# LLM Algorithmic Benchmark
|
||||||
|
|
||||||
|
This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models.
|
||||||
|
|
||||||
|
The tests are run automatically via GitHub Actions, and the results are updated in this README.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Set the percentage of tests to run during the benchmark. 100% runs all tests.
|
||||||
|
|
||||||
|
<!-- CONFIG_START -->
|
||||||
|
RUN_PERCENTAGE: 100
|
||||||
|
<!-- CONFIG_END -->
|
||||||
|
|
||||||
|
## Models Under Test
|
||||||
|
|
||||||
|
The following models are included in the benchmark run.
|
||||||
|
|
||||||
|
<!-- MODELS_START -->
|
||||||
|
google/gemini-2.5-pro
|
||||||
|
anthropic/claude-sonnet-4.5
|
||||||
|
openai/gpt-5-codex
|
||||||
|
<!-- MODELS_END -->
|
||||||
|
|
||||||
|
## Benchmark Results
|
||||||
|
|
||||||
|
The table below shows the pass/fail status for each model on each test.
|
||||||
|
|
||||||
|
<!-- RESULTS_START -->
|
||||||
|
| Model | 1_dijkstra | 2_convex_hull | 3_lis | 4_determinant |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| google/gemini-2.5-pro | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail |
|
||||||
|
| anthropic/claude-sonnet-4.5 | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail |
|
||||||
|
| openai/gpt-5-codex | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail |
|
||||||
|
<!-- RESULTS_END -->
|
||||||
Reference in New Issue
Block a user