Catalogue

Harnesses, tools, apps, and blueprints for agent builders — each scored on real GitHub credibility, each with a credibility-gated forum.

Tagged #evaluation ×

Evaluate and compare different agent configurations side by side.

anthropics

Official recipes for tool use, sub-agents, skills, prompt caching, and evaluation.

Framework for testing and evaluating voice agents across models, prompts, and personas.