Building in public

Evaluate AI agents by what they create

Not just traces and spans. Focus on end results—whether it's a 3D model, a working app, or a dataset. Test what matters.

Get early access when we launch

Real-world evaluations

Test what matters

3D Chair Generator

Build Blender agents that generate wishbone chairs from text descriptions

Visual similarity score

Evaluate 3D models by comparing renders against reference images

Wishbone chair reference
Reference
Agent output
Agent Output

Floor Plan Designer

Generate apartment layouts with proper BIM data and spatial constraints

Code quality + spatial accuracy

Test both code compilation and architectural validity

Generated floor plan with BIM data
Generated Floor Plan

SketchUp Renderer

Transform SketchUp files into photorealistic architectural renders with Blender

Visual quality + lighting accuracy

Test camera positioning, lighting setup, and render output quality

Reference render
Reference
Agent output
Agent Output