multipleof4/lynchmark: LLM Benchmark - lynchmark - Planet Renox Source

multipleof4/lynchmark

mirror of https://github.com/multipleof4/lynchmark.git synced 2026-07-19 14:25:45 +00:00

Go to file

multipleof4 36282d0a40 Delete debug.html

2025-11-18 09:50:14 -08:00

.github/workflows

Feat: Add workflow for single model benchmarks

2025-11-14 16:01:08 -08:00

Refactor: Allow benchmark runs for a single model

2025-11-14 16:01:04 -08:00

Fix: Enforce UTC calculations in scheduler prompt

2025-11-18 09:49:13 -08:00

.gitignore

Revert: Update .gitignore

2025-11-13 13:00:44 -08:00

gemini.html

Create gemini.html

2025-11-18 08:45:07 -08:00

index.html

Fix: add per-model grade summary

2025-11-14 16:17:25 -08:00

package.json

Revert: Update package.json

2025-11-13 13:00:54 -08:00

README

Update README

2025-11-18 08:50:09 -08:00

results.json

Docs: Update benchmark results

2025-11-18 17:37:06 +00:00

README

Set the percentage of tests to run during the benchmark. 100% runs all tests.

<!-- CONFIG_START -->
RUN_PERCENTAGE: 100
SHARED_PROMPT: "Provide production-ready and maintainable JavaScript code. Apply code golfing practices but don't put everything in a single line. No comments. Your code will execute in the browser."
<!-- CONFIG_END -->


The following models are included in the benchmark run.

<!-- MODELS_START -->
google/gemini-3-pro-preview
anthropic/claude-sonnet-4.5 TEMP:0.7
openai/gpt-5.1-codex
moonshotai/kimi-k2-thinking
google/gemini-2.5-pro
openrouter/sherlock-think-alpha
<!-- MODELS_END -->