Commit Graph

28 Commits

Author SHA1 Message Date
53b1971c30 Feat: Verify generated code with Playwright 2025-11-27 10:39:54 -08:00
8d62010aad Refactor: Remove results.json usage 2025-11-27 10:25:24 -08:00
55f2b4b964 Delete: Cleanup script no longer needed 2025-11-27 10:24:41 -08:00
82aa65df97 Feat: Append generation time to output file, store 1 in results 2025-11-27 10:24:02 -08:00
37a86c7b48 Feat: Script to remove results for deleted tests 2025-11-26 17:40:39 -08:00
7e5ff30656 Fix: Verify test.js existence to ignore stale directories 2025-11-26 17:27:22 -08:00
4bafe1b0bc Feat: Add --test arg and fix clean logic 2025-11-26 17:21:35 -08:00
53592ed9e3 Fix: Save Gemini tests to dedicated outputs_gemini directory 2025-11-18 10:30:33 -08:00
7420f7d2cb Feat: Support Google API and Gemini mode 2025-11-18 10:28:01 -08:00
ab4f7671c0 Refactor: Allow benchmark runs for a single model 2025-11-14 16:01:04 -08:00
d1077fa4ac Revert: Update run-benchmark.js 2025-11-13 13:01:02 -08:00
7f8931af00 Feat: Record execution output for each test run 2025-11-13 12:48:54 -08:00
e3132e55b3 Refactor: Make code export handling more robust 2025-10-13 11:49:02 -07:00
eaf29ad08d Feat: Use temperature from README model config 2025-10-13 11:21:35 -07:00
a2375eb537 Refactor: Use shared prompt from README config 2025-10-13 10:57:37 -07:00
7b5aad47d1 Fix: Correct README path 2025-10-13 10:35:51 -07:00
27db7da944 Refactor: Record generation time and output results.json 2025-10-13 10:33:48 -07:00
fdd4654843 Refactor: Generate browser-runnable test files 2025-10-13 10:24:15 -07:00
3ad61620f1 Fix: Use correct README filename without extension 2025-10-13 06:25:52 -07:00
36df1fc541 Fix: Use process.cwd() for robust file paths 2025-10-13 06:22:26 -07:00
d562b6409f Refactor: Clear old test outputs before each run 2025-10-13 06:17:37 -07:00
c1aaadeb7a Revert: Update run-benchmark.js 2025-10-13 06:15:02 -07:00
39f49566ee Fix: Correct test execution and clear old outputs 2025-10-13 06:10:59 -07:00
acf1625272 Fix: Modernize to ES Modules to fix test runner 2025-10-13 06:05:19 -07:00
daa8806226 Feat: Add test execution timing and format results as list 2025-10-13 06:01:06 -07:00
9c773c493c Refactor: Generate indented text block for README 2025-10-13 05:56:09 -07:00
b9b01d6bfa Refactor: Support test folders and save LLM outputs 2025-10-13 05:50:26 -07:00
b258c84ae4 Feat: Add benchmark runner script 2025-10-13 05:29:31 -07:00