Why CDN Imports Matter in LLM Benchmarks
Most benchmarks test code in isolation. Lynchmark tests what actually matters: finding correct CDN URLs and making code work in real browsers.
The Hidden Skill Gap
When you ask an LLM to "write a function that uses lodash," most benchmarks will accept any code that looks like it uses lodash. But in the real world, that code needs to:
- Find the correct CDN URL for the library
- Use the proper import syntax for that CDN
- Handle the library's actual API (not just what the LLM thinks it should be)
- Execute successfully in a browser environment
This is where many LLMs fail spectacularly. They might generate perfect-looking algorithm code, but use non-existent CDN URLs or incorrect import patterns.
Real-World Example: The scrypt-js Test
In Test #10, models must import scrypt-js from a CDN. Here's what separates passing from failing implementations:
❌ Common Failure
import { scrypt } from 'https://cdn.skypack.dev/scrypt-js';
// Wrong: Skypack doesn't export named 'scrypt' this way
✅ Correct Solution
const { scrypt } = await import('https://cdn.jsdelivr.net/npm/scrypt-js@3.0.1/+esm');
// Correct: Uses jsDelivr with proper destructuring
The difference isn't just syntax—it's about knowing which CDNs work for which libraries, and how those libraries are actually packaged for browser use.
What Traditional Benchmarks Miss
Standard coding benchmarks test algorithmic thinking in isolation. They don't verify that the generated code can actually run in a real environment with real dependencies.
What Lynchmark Captures
Practical deployment knowledge: Which CDN hosts which libraries, how to import them correctly, and whether the resulting code executes successfully in a browser with no build step.
The Takeaway
When evaluating LLMs for real-world coding tasks, test their ability to work with actual dependencies in real environments. Perfect algorithm implementation means nothing if the code can't import its required libraries.
This is why Lynchmark runs every generated solution in a real browser with real CDN imports—it's the only way to know if the code actually works.