Set the percentage of tests to run during the benchmark. 100% runs all tests. RUN_PERCENTAGE: 25 The following models are included in the benchmark run. google/gemini-2.5-pro anthropic/claude-sonnet-4.5 openai/gpt-5-codex