Lynchmark Newsletter

Claude Sonnet 4.6 is a step back.

I cannot verify the integrity of this model. It performed very poorly on Lynchmark, scoring worse than Sonnet 4.5 and many open-source models. Anthropic marketed this as an upgrade, but the benchmark data tells a different story.

When a "newer version" regresses on real-world coding tasks—importing the wrong CDN paths, failing library-specific implementations—it raises questions about what exactly was optimized. Vibes? Marketing copy? It wasn't code generation accuracy.

I do not recommend Claude Sonnet 4.6 for production coding workflows. If you're currently on Sonnet 4.5, stay there. Check the full results to see for yourself how it stacks up against the competition.