mirror of
https://github.com/multipleof4/lynchmark.git
synced 2026-01-13 16:17:54 +00:00
48 lines
1.2 KiB
Plaintext
48 lines
1.2 KiB
Plaintext
# LLM Algorithmic Benchmark
|
|
|
|
This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models.
|
|
|
|
The tests are run automatically via GitHub Actions, and the results are updated in this README.
|
|
|
|
## Configuration
|
|
|
|
Set the percentage of tests to run during the benchmark. 100% runs all tests.
|
|
|
|
<!-- CONFIG_START -->
|
|
RUN_PERCENTAGE: 25
|
|
<!-- CONFIG_END -->
|
|
|
|
## Models Under Test
|
|
|
|
The following models are included in the benchmark run.
|
|
|
|
<!-- MODELS_START -->
|
|
google/gemini-2.5-pro
|
|
anthropic/claude-sonnet-4.5
|
|
openai/gpt-5-codex
|
|
<!-- MODELS_END -->
|
|
|
|
## Benchmark Results
|
|
|
|
The list below shows the pass/fail status and execution time for each model on each test.
|
|
|
|
<!-- RESULTS_START -->
|
|
**google/gemini-2.5-pro**
|
|
- 1_dijkstra: ❌ Fail (0.213s)
|
|
- 2_convex_hull: ⚪ Not Run
|
|
- 3_lis: ⚪ Not Run
|
|
- 4_determinant: ⚪ Not Run
|
|
|
|
**anthropic/claude-sonnet-4.5**
|
|
- 1_dijkstra: ❌ Fail (0.189s)
|
|
- 2_convex_hull: ⚪ Not Run
|
|
- 3_lis: ⚪ Not Run
|
|
- 4_determinant: ⚪ Not Run
|
|
|
|
**openai/gpt-5-codex**
|
|
- 1_dijkstra: ❌ Fail (0.245s)
|
|
- 2_convex_hull: ⚪ Not Run
|
|
- 3_lis: ⚪ Not Run
|
|
- 4_determinant: ⚪ Not Run
|
|
<!-- RESULTS_END -->
|