|
|
df97ff97b9
|
Fix: Add better API error handling and logging
|
2025-12-17 09:11:56 -08:00 |
|
|
|
4d3beabb98
|
Feat: Support EFF:xhigh syntax for reasoning effort
|
2025-12-11 13:06:14 -08:00 |
|
|
|
53b1971c30
|
Feat: Verify generated code with Playwright
|
2025-11-27 10:39:54 -08:00 |
|
|
|
8d62010aad
|
Refactor: Remove results.json usage
|
2025-11-27 10:25:24 -08:00 |
|
|
|
55f2b4b964
|
Delete: Cleanup script no longer needed
|
2025-11-27 10:24:41 -08:00 |
|
|
|
82aa65df97
|
Feat: Append generation time to output file, store 1 in results
|
2025-11-27 10:24:02 -08:00 |
|
|
|
37a86c7b48
|
Feat: Script to remove results for deleted tests
|
2025-11-26 17:40:39 -08:00 |
|
|
|
7e5ff30656
|
Fix: Verify test.js existence to ignore stale directories
|
2025-11-26 17:27:22 -08:00 |
|
|
|
4bafe1b0bc
|
Feat: Add --test arg and fix clean logic
|
2025-11-26 17:21:35 -08:00 |
|
|
|
53592ed9e3
|
Fix: Save Gemini tests to dedicated outputs_gemini directory
|
2025-11-18 10:30:33 -08:00 |
|
|
|
7420f7d2cb
|
Feat: Support Google API and Gemini mode
|
2025-11-18 10:28:01 -08:00 |
|
|
|
ab4f7671c0
|
Refactor: Allow benchmark runs for a single model
|
2025-11-14 16:01:04 -08:00 |
|
|
|
d1077fa4ac
|
Revert: Update run-benchmark.js
|
2025-11-13 13:01:02 -08:00 |
|
|
|
7f8931af00
|
Feat: Record execution output for each test run
|
2025-11-13 12:48:54 -08:00 |
|
|
|
e3132e55b3
|
Refactor: Make code export handling more robust
|
2025-10-13 11:49:02 -07:00 |
|
|
|
eaf29ad08d
|
Feat: Use temperature from README model config
|
2025-10-13 11:21:35 -07:00 |
|
|
|
a2375eb537
|
Refactor: Use shared prompt from README config
|
2025-10-13 10:57:37 -07:00 |
|
|
|
7b5aad47d1
|
Fix: Correct README path
|
2025-10-13 10:35:51 -07:00 |
|
|
|
27db7da944
|
Refactor: Record generation time and output results.json
|
2025-10-13 10:33:48 -07:00 |
|
|
|
fdd4654843
|
Refactor: Generate browser-runnable test files
|
2025-10-13 10:24:15 -07:00 |
|
|
|
3ad61620f1
|
Fix: Use correct README filename without extension
|
2025-10-13 06:25:52 -07:00 |
|
|
|
36df1fc541
|
Fix: Use process.cwd() for robust file paths
|
2025-10-13 06:22:26 -07:00 |
|
|
|
d562b6409f
|
Refactor: Clear old test outputs before each run
|
2025-10-13 06:17:37 -07:00 |
|
|
|
c1aaadeb7a
|
Revert: Update run-benchmark.js
|
2025-10-13 06:15:02 -07:00 |
|
|
|
39f49566ee
|
Fix: Correct test execution and clear old outputs
|
2025-10-13 06:10:59 -07:00 |
|
|
|
acf1625272
|
Fix: Modernize to ES Modules to fix test runner
|
2025-10-13 06:05:19 -07:00 |
|
|
|
daa8806226
|
Feat: Add test execution timing and format results as list
|
2025-10-13 06:01:06 -07:00 |
|
|
|
9c773c493c
|
Refactor: Generate indented text block for README
|
2025-10-13 05:56:09 -07:00 |
|
|
|
b9b01d6bfa
|
Refactor: Support test folders and save LLM outputs
|
2025-10-13 05:50:26 -07:00 |
|
|
|
b258c84ae4
|
Feat: Add benchmark runner script
|
2025-10-13 05:29:31 -07:00 |
|