Not just traces and spans. Focus on end results—whether it's a 3D model, a working app, or a dataset. Test what matters.
Test what matters
Build Blender agents that generate wishbone chairs from text descriptions
Evaluate 3D models by comparing renders against reference images
Generate apartment layouts with proper BIM data and spatial constraints
Test both code compilation and architectural validity
Transform SketchUp files into photorealistic architectural renders with Blender
Test camera positioning, lighting setup, and render output quality