Why CDN Imports Matter in LLM Benchmarks
-- Most benchmarks test code in isolation. Lynchmark tests what actually matters: finding correct CDN URLs and making code work in real browsers. -
-The Hidden Skill Gap
-- When you ask an LLM to "write a function that uses lodash," most benchmarks will accept any code that looks like it uses lodash. But in the real world, that code needs to: -
--
-
- Find the correct CDN URL for the library -
- Use the proper import syntax for that CDN -
- Handle the library's actual API (not just what the LLM thinks it should be) -
- Execute successfully in a browser environment -
- This is where many LLMs fail spectacularly. They might generate perfect-looking algorithm code, but use non-existent CDN URLs or incorrect import patterns. -
-Real-World Example: The scrypt-js Test
-
- In Test #10, models must import scrypt-js from a CDN. Here's what separates passing from failing implementations:
-
❌ Common Failure
-import { scrypt } from 'https://cdn.skypack.dev/scrypt-js';
-// Wrong: Skypack doesn't export named 'scrypt' this way
- ✅ Correct Solution
-const { scrypt } = await import('https://cdn.jsdelivr.net/npm/scrypt-js@3.0.1/+esm');
-// Correct: Uses jsDelivr with proper destructuring
- - The difference isn't just syntax—it's about knowing which CDNs work for which libraries, and how those libraries are actually packaged for browser use. -
-What Traditional Benchmarks Miss
-- Standard coding benchmarks test algorithmic thinking in isolation. They don't verify that the generated code can actually run in a real environment with real dependencies. -
-What Lynchmark Captures
-- Practical deployment knowledge: Which CDN hosts which libraries, how to import them correctly, and whether the resulting code executes successfully in a browser with no build step. -
-The Takeaway
-- When evaluating LLMs for real-world coding tasks, test their ability to work with actual dependencies in real environments. Perfect algorithm implementation means nothing if the code can't import its required libraries. -
-- This is why Lynchmark runs every generated solution in a real browser with real CDN imports—it's the only way to know if the code actually works. -
-