mirror of
https://github.com/multipleof4/lynchmark.git
synced 2026-01-13 16:17:54 +00:00
3b9298e17ae611077a8ca61b2bf888c0a67674f0
# LLM Algorithmic Benchmark This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models. The tests are run automatically via GitHub Actions, and the results are updated in this README. ## Configuration Set the percentage of tests to run during the benchmark. 100% runs all tests. <!-- CONFIG_START --> RUN_PERCENTAGE: 100 <!-- CONFIG_END --> ## Models Under Test The following models are included in the benchmark run. <!-- MODELS_START --> google/gemini-2.5-pro anthropic/claude-sonnet-4.5 openai/gpt-5-codex <!-- MODELS_END --> ## Benchmark Results The table below shows the pass/fail status for each model on each test. <!-- RESULTS_START --> | Model | 1_dijkstra | 2_convex_hull | 3_lis | 4_determinant | | --- | --- | --- | --- | --- | | google/gemini-2.5-pro | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail | | anthropic/claude-sonnet-4.5 | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail | | openai/gpt-5-codex | ❌ Fail | ❌ Fail | ❌ Fail | ❌ Fail | <!-- RESULTS_END -->
Description
Languages
JavaScript
85.2%
HTML
14.8%