2025-10-13 12:37:53 +00:00
2025-10-13 05:30:51 -07:00
2025-10-13 05:29:31 -07:00
2025-10-13 12:37:53 +00:00

LLM Algorithmic Benchmark

This repository contains a suite of difficult algorithmic tests to benchmark the code generation capabilities of various Large Language Models.

The tests are run automatically via GitHub Actions, and the results are updated in this README.

Models Under Test

The following models are included in the benchmark run.

google/gemini-2.5-pro anthropic/claude-sonnet-4.5 openai/gpt-5-codex

Benchmark Results

The table below shows the pass/fail status for each model on each test.

Model 1_dijkstra 2_convex_hull 3_lis 4_determinant
google/gemini-2.5-pro Fail Fail Fail Fail
anthropic/claude-sonnet-4.5 Fail Fail Fail Fail
openai/gpt-5-codex Fail Fail Fail Fail
Description
LLM Benchmark
Readme 2.2 MiB
Languages
JavaScript 85.2%
HTML 14.8%