Technical Insight

Why CDN Imports Matter in LLM Benchmarks

Most benchmarks test code in isolation. Lynchmark tests what actually matters: finding correct CDN URLs and making code work in real browsers.

The Hidden Skill Gap

When you ask an LLM to "write a function that uses lodash," most benchmarks will accept any code that looks like it uses lodash. But in the real world, that code needs to:

Find the correct CDN URL for the library
Use the proper import syntax for that CDN
Handle the library's actual API (not just what the LLM thinks it should be)
Execute successfully in a browser environment

This is where many LLMs fail spectacularly. They might generate perfect-looking algorithm code, but use non-existent CDN URLs or incorrect import patterns.

Real-World Example: The scrypt-js Test

In Test #10, models must import scrypt-js from a CDN. Here's what separates passing from failing implementations:

❌ Common Failure

import { scrypt } from 'https://cdn.skypack.dev/scrypt-js';
// Wrong: Skypack doesn't export named 'scrypt' this way

✅ Correct Solution

const { scrypt } = await import('https://cdn.jsdelivr.net/npm/scrypt-js@3.0.1/+esm');
// Correct: Uses jsDelivr with proper destructuring

The difference isn't just syntax—it's about knowing which CDNs work for which libraries, and how those libraries are actually packaged for browser use.

What Traditional Benchmarks Miss

Standard coding benchmarks test algorithmic thinking in isolation. They don't verify that the generated code can actually run in a real environment with real dependencies.

What Lynchmark Captures

Practical deployment knowledge: Which CDN hosts which libraries, how to import them correctly, and whether the resulting code executes successfully in a browser with no build step.

The Takeaway

When evaluating LLMs for real-world coding tasks, test their ability to work with actual dependencies in real environments. Perfect algorithm implementation means nothing if the code can't import its required libraries.

This is why Lynchmark runs every generated solution in a real browser with real CDN imports—it's the only way to know if the code actually works.